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(54) Title: GENOME OF LEGIONELLA PNEUMOPHILA PARIS AND LENS STRAIN-DIAGNOSTIC AND EPIDEMIOLOGI- 
CAL APPLICATIONS 

(57) Abstract: The object of the invention is the genomic sequence and nucleotidic sequences coding for polypeptides of Legionella 
pneumophila Paris strain and Lens strain, such as cellular surface polypeptides, especially specific between these two strains and/or 
relative to the Philadelphia strain, or implied in the virulence or in the polysaccharide biosynthesis of cellular envelope, as well as 
vectors including said sequences and cells transformed by these vectors. The invention also concerns processes for detection of these 
nucleic acids or polypeptides and diagnostic typing kits for bacteria of the Legionella genre, especially of theLegionella pneumophila 
species, such as the Paris and Lens strains, between them and/or relative to the Philadelphia strain. The invention especially concerns 
a repeated nucleic sequence specific to the Legionella pneumophila species and its utilization as an analysis target in processes 
for detection of the presence of these bacteria. The aim of the invention is also a method for selection of compounds capable of 
modulating the biosynthesis of these polysaccharides of cellular envelope utilizing said nucleotidic sequences or said polypeptides. 
The invention finally comprises pharmaceutical compositions, especially vaccinal, for the prevention and/or treatment of bacterial 
infections, in particular by Legionella pneumophila Paris strain and/or Lens strain. 
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GENOME OF LEGIONELLA PNEUMOPHILA PARIS AND LENS STRAIN - 
DIAGNOSTIC AND EPIDEMIOLOGICAL APPLICATIONS 

The object of the invention is the genomic sequence and nucleotidic sequences 
5 coding for polypeptides of Legionella pneumophila Paris and Lens strain, such as 
polypeptides of cellular surface, especially specific between these two strains and/or 
relative to the Philadelphia strain, or implied in the virulence or in the biosynthesis of 
polysaccharides having a cellular envelope, as well as vectors including said sequences 
and cells transformed by these vectors. The invention likewise concerns processes for 

1 0 detection of these nucleic acids or polypeptides and diagnostic kits or kits for typing 
bacteria of the Legionella genre, especially the Legionella pneumophila species, such as 
the Paris and Lens strain, between them and/or relative to the Philadelphia strain. The 
invention especially concerns a specific repeated nucleic sequence of the Legionella 
pneumophila species and its utilization as an analysis target in processes for detection of 

1 5 the presence of these bacteria. The aim of the invention also is a method for selection of 
compounds capable of modulating the biosynthesis of these polysaccharides having a 
cellular envelope utilizing said nucleotidic sequences or said polypeptides. The 
invention finally comprises des pharmaceutical compositions, especially vaccinal, for 
the prevention and/or treatment of bacterial infections, in particular by the Legionella 

2 0 pneumophila Paris and/or Lens strain. 

Legionella is a bacteria of the environment responsible for legionellosis and 
Pontiac fever. The epidemiological data indicate that probably only certain isolates are 
capable of causing clinical cases. The L. pneumophila species seems to have a more 
significant virulence than the other species, by being responsible for 90% of the cases of 

2 5 legionellosis. At the centre of this species, among the fifteen serogroups, the isolates of 
serogroup 1 are associated with 80% of cases. 

To date, transmission from person to person has never been observed and 
measures for preventing legionellosis are thus concentrated on elimination of this 
pathogen from water circulation or from water-cooling towers in air-conditioning 

30 systems. In order to establish a rational policy for prevention, it is necessary to prevent 
the risk associated with each strain. In this optic, it would be desirable to be able to have 
processes or diagnostic kits available based on the recognition of protein or specific 
nucleic acid of this genre legio species pathogen or of a particular strain (or again sub- 
species) of this species. 
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In effect, the interest in using these specific sequences in the domain of 
diagnostics or epidemiology rests on the possibility of analyzing a large number of 
sequences at the same time and very rapidly for: 

- classification or typing of bacteria as a function of the presence of a sequence 
5 or of a profile of sequences characteristic of a genre, species or strain (sub-species) of 

bacteria, in particular in association with the gravity or not of pathologies which such 
bacteria can induce in case of infection in mammals, especially in humans; and 

- simultaneous comparison of sequence or profile of sequences between different 
genres, species or strain (sub-species) of bacteria, pathogenic or not, especially enabling 

1 0 identification of a gene, or the corresponding proteic sequence, or a profile of genes 
whereof the presence and/or expression in a bacteria is specific to a genre, species or 
strain of bacteria, and/or to its pathogenicity or not, especially by means of tools such as 
DNA chips or, if required, protein chips, on which these specific sequences are 
immobilized. 

1 5 These specific sequences can be specific sequences of the Legionella genre, or 

of a pathogenic bacteria of the Legionella genre and/or of the Legionella pneumophila 
species, or again of a bacteria of the Legionella pneumophila species Paris and/or Lens 
strain or again specific to a bacteria of the Legionella pneumophila species Paris and/or 
Lens strain relative to the Philadelphia strain. 

2 0 This information is widely utilized especially to rapidly identify the presence or 

not of the pathogenic bacteria, the gravity of the infection which it can cause, the 
treatment adapted to an infection, and/or the necessity and the means to be put in place 
to decontaminate the objects, circuits or fluids which are contaminated or could be 
contaminated This information will likewise be widely used for epidemiological studies 

2 5 relative to this genre of bacteria. 

This is just one of the aspects of the present invention. 

In a first stage, the inventors have studied and attempted to comprehend the 
genomic diversity at the centre of the Legionella genre by complete sequencing of the L. 
pneumophila serogroup 1 strain, Paris strain and Lens strain found in different French 
30 departements (2) and low-level sequencing of covering of two strains not belonging to 
the L. pneumophila species. The strains selected are L. longbeachae, responsible for 
cases of legionellosis essentially in Australia, and L. anisa frequently found in water 
circulation but not found in patients. 
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The complete sequencing has likewise been carried out within the scope of the 
invention of an epidemic ^pneumophila serogroup 1 strain known as « Lens strain », 
responsible for a major epidemic in France with 86 cases and 17 deaths between 
November 2003 and January 2004. 
5 The genes found to be variable between the strains of Legionella, as well as 

preserved genes having a functional tie to the virulence of L.pneumophila can be 
utilized to manufacture DNA chips. 

A large number of isolates of various origins isolated from the environment or 
originating from clinical cases could be analyzed by using this tool to identify markers 
10 enabling the two categories of strains to be discriminated. The comparison of endemic 
and epidemic isolates will likewise provide bases for understanding the specificities of 
these strains and in particular the adaptability and stability of the Paris strain. 

This approach also helps identify new functions necessary to the virulence of 
Legionella in humans and aid understanding of the different stages of this disease. 
1 5 The object of the invention is to allow development of novel tools for typing the 

strains of Legionella, These tools could be of the DNA "chip" type or of another type. 
The novel characteristics of these typing tools will be the following: 

* Rapidity and simplicity of use 

* High capacity for discrimination between the strains 

20 * Possibility for providing information on the genomic content of the strain analyzed 
and possibly prevent the risk associated with contamination by Legionella. 

The inventors have, during this study, brought to light genes found to be variable 
between the strains of Legionella, as well as preserved genes having a functional tie to 
wall or cellular envelope, or the virulence of L. pneumophila, genes which will be able 
2 5 to be used for carrying out these processes or diagnostic kits, especially for producing 
biochips with protein or DNA. 

A large number of isolates of diverse origin isolated from the environment or 
originating from clinical cases could be analyzed by using these tools for the purpose of 
identifying markers for discriminating the two categories of strains. 
3° This approach will also help identify new functions necessary for the virulence 

of Legionella in humans and aid in comprehension of the different stages of this disease. 

The object of the invention is thus to allow development of novel tools for 
typing of strains of Legionella. These tools could especially be of the protein or 
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DNA/RNA biochip type. The novel characteristics of these typing tools will be the 
following: 

- rapidity and simplicity of use; 

- high capacity for discrimination between strains; and 

5 - possibility of supplying information on the genomic content of the strain analyzed and 
of possibly preventing the risk associated with contamination by Legionella. 

The inventors have, in the first instance, sequenced the complete genome of L. 
pneumophila Paris strain in the form of 56 contigs (SEQ ID No. 1 to 56), a sequence 
made up of a long chromosome of around 3.65 Mb and a long plasmid of around 36 kb. 

10 The inventors have likewise identified on these contigs (SEQ ID No. 1 to 56) the 
nucleic sequences coding for proteins with their respective function (cf Table I with the 
annotated sequences). The inventors have additionally compared these sequences to the 
sequences of the genome of L. pneumophila Philadelphia strain available at the website 
htt p : //genome3.cpmc.columbia.edu/-legion/index.html and revealed their presence or 

1 5 not in the sequences of this Philadelphia strain. The sequences of the genome of L. 
pneumophila Philadelphia strain available on the website 
http://genome3.cpmc. columbia.edu/-legion/index.html correspond to the sequences of 
the 51 contigs identified in the list of sequences under the SEQ ID Nos. 3456 to 3506. 
This comparison, made from the available genomic sequence of L. pneumophila 

2 0 Philadelphia strain and the proteic sequences obtained by the inventors from the 6 

possible reading frames, has revealed that some 88 % of these two genomes are very 
strongly preserved (95 to 100 % of proteic identity), the remaining 12 % being specific 
to each strain (cf. Tables I and IV). These results thus demonstrate that there is a wide 
genomic diversity within the L. pneumophila species. A serine protease autotransporter 
25 homolog in which is inserted ten repetitions in tandem of a pattern of 60 amino acids 
was especially identified among the genes specific to the Paris strain. The 
autotransporters are secretion systems for the Gram negative bacteria in which the N- 
terminal and C-terminal parts respectively enable secretion across the internal 
membrane and the formation of pores in the external membrane. The central part of the 

3 0 protein can then remain exposed at the level of the cellular surface or can be split and 

salted out in the external medium. The role of certain autotransporters in the virulence 
of the negative Gram bacteria has already been shown; furthermore, work on the 
autotransporters of the enterobacteria has helped identify a group of serine protease 
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present solely in pathogenic bacterias whereof the diversity of functions could be linked 
to the adaptation to the niche occupied by the pathogen. 

In addition, the inventors have revealed a very wide inter-species diversity from 
the sequencing of a pathogenic strain of L. longbeachae, which causes very few cases of 
5 legionellosis in France, but which is the major source of legionellosis in Australia, and 
that of a non-pathogenicic L. anisa strain. We were able to identify to date 703 ORFs of 
the L. longbeachae strain, whereof 53 % are specific to it; the majority of ORFs 
preserved between the L. longbeachae and L. pneumophila Paris strains have a 
percentage of proteic homology greater than 80%. The preliminary results obtained by 

10 the inventors on the non-pathogenic L. anisa strain have helped identify 54 % of 
specific sequences and un percentage of proteic homology greater than 70 % for the 
sequences preserved between the L. anisa and L. pneumophila Paris strains. 

Tables I and II (« bestblast » obtained for each of the nucleic and proteic 
sequences corresponding to the annotated ORFs) and X to XXI hereinbelow comprise 

1 5 for each of the ORFs identified either in the Paris strain (Tables I and XIV), or in the 
Lens strain (Table XVI), its position on the contigs or chromosomes, and, if required, 
the existence of a peptide signal, the best result of the blast on nrprot (Best-Blastp). The 
ORFs: 

- specific to the L. pneumophila Paris strain relative to the L. pneumophila 
2 0 Philadelphia strain; 

- specific to the L. pneumophila Paris strain relative to the L. pneumophila 
Philadelphia and Lens strains (Table XVII); 

- specific to the L. pneumophila Lens strain relative to the L. pneumophila 
Philadelphia and Paris strains (Table XVIII); 

2 5 - specific to the L. pneumophila Philadelphia strain relative to the L. 

pneumophila Paris and Lens strains (Table XI); 

were identified in considering as specific the ORFs having a percentage of proteic 
identity less than 75 %. In the event where ORF is preserved in the two genomes, the 
percentage of identity between the two proteins is mentioned. 
30 In this Table I, the ORFs present in the partial sequence of the L. longbeachae 

strain have likewise been noted. Finally, the ORFs specific to the Legionella genre were 
identified by considering as specific the ORFs having a percentage of proteic identity 
with sequences of the nrprot bank less than 25 %. 
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In Table XIX, the ORFs present at the same time in the Paris and Lens strain 
were indicated, though absent in the Philadelphia strain. 

In Table XX, the ORFs present at the same time in the Paris and Philadelphia 
strain were indicated, though absent in the Lens strain. 

Finally, in Table XXI, the ORFs present at the same time in the Philadelphia and 
Lens strain were indicated, though absent in the Paris strain. 

In conclusion, the diversity revealed by the content of the present patent 
application helps define proteic probes, such as antibodies, or DNA probes for 
developing a typing tool. The utilization of this tool on a large number of strains 
isolated from patients and strains isolated from the environment will enable this tool to 
be validated, a tool which will aid in predicting the risk associated with a strain by 
discriminating in a certain manner the strains isolated from patients of other strains. 

Among the significant families of proteins of Legionella pneumophila Paris 
strain the family of surface proteins or that of proteins implied in the biosynthesis of 
surface polysaccharides can be cited, or again that of proteins implied in the virulence 
of these bacteria. The process of evolution has allowed the development of a number of 
unique mechanisms on the Gram+bacteria, by which they can immobilize proteins on 
their surface. The functions of these different proteins of cellular walls are extremely 
diverse. However, many proteins linked covalently to the surface of the Gram+ 
pathogens are estimated to be important for the survival of the pathogen inside the 
infected host. The study of Legionella pneumophila Paris strain demands novel 
approaches, in particular genetic, to improve understanding of the different metabolic 
paths of this organism. 

Accordingly, it is object of the present invention to divulge the complete 
sequence of the genome of Legionella pneumophila Paris strain (Collection de the 
Institut Pasteur CIP 107-629-T), a sequence obtained from a collection of clones (BAC) 
filed on 19 November 2003 with the Collection Nationale de Cultures de 
Microorganisms (CNCM) [National Collection of Microorganism Cultures], 25 rue du 
Docteur Roux, 75724 Paris Cedex 15, France, according to the arrangements of the 
Budapest Treaty and registered under file number 1-3137, as well as all the genes 
contained in said genome. 

It is also another object of the present invention to divulge the complete 
sequence of the genome of Legionella pneumophila Lens strain, a sequence obtained 
from a collection of clones (BAC) filed on 23 September 2004 with the Collection 
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Nationale de Cultures de Microorganisms (CNCM) [National Collection of 
Microorganism Cultures], 25 rue du Docteur Roux, 75724 Paris Cedex 15, France, 
according to the arrangements of the Budapest Treaty and registered under file number 
1-3306, as well as all the genes contained in said genome. 

In effect, knowledge of the genome of these organisms enables the interactions 
between the different genes, the different proteins, and the different metabolic paths to 
be better defined. In effect, and contrary to divulging isolated sequences, the complete 
genomic sequence of an organism forms a whole, allowing all the information necessary 
to this organism to grow and function to be obtained immediately. 

If the present invention provides the nucleotidic sequence of the genome of 
Legionella pneumophila Paris strain (Collection de the Institut Pasteur CIP 107-629-T), 
having been the object of a filing of a collection of clones (BAC) covering this genome' 
with the C.N.C.M. in Paris on 19 November 2003 and registered under file number I- 
3138, and likewise provides the nucleotidic sequence of the genome of Legionella 
pneumophila Lens strain, a sequence obtained from a collection of clones (BAC) filed 
on 23 September 2004 with CNCM, and likewise provides certain polypeptide 
sequences coded by these two genomes, the specialist will be able to determine the 
other ORFs, by utilizing known methods, and appropriate software. 

In the set of claims hereinbelow, the term « nucleotidic sequence » will 
especially be able to be replaced by the term « polynucleotide » without modifying the 
object and the scope of the set of claims such as filed. 
The present invention thus relates to: 

- a genomic nucleotidic sequence of Legionella pneumophila Paris strain 
characterized in that it is selected among the sequences SEQ ID 3507 and 3508, SEQ ID 
N° 55 and the sequences SEQ ID N° 1 to SEQ ID N° 54, and SEQ ID N° 56; 

- a genomic nucleotidic sequence of Legionella pneumophila Lens strain 
characterized in that it is selected among the sequences SEQ ID 6733 and 6734. 

The present invention likewise relates to an isolated or purified nucleotidic 
sequence: 

(A) of Legionella pneumophila Paris strain, characterized in that it is selected 

among: 

a) a nucleotidic sequence comprising at least one sequence having 80 % 
identity with the sequences SEQ ID 3507 and 3508, and SEQ ID N° 1 to SEQ ID N° 56; 
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b) a nucleotide sequence hybridizing in very stringent conditions with the 
sequences SEQ ID 3507 and 3508, and SEQ ID N° 1 to SEQ ID N° 56; 

c) a nucleotidic sequence complementing sequences SEQ ID 3507 and 
3508 and SEQ ID N° 1 to SEQ ID N° 56, or complementing a nucleotidic sequence 

5 such as defined in a), or b), or a corresponding nucleotidic sequence of RNA; and 

d) a nucleotidic sequence of at least 15 nucleotides of fragment 
representative of sequences SEQ ID 3507 and 3508, and SEQ ID NM to SEQ ID 
N° 56, or of fragment representative of their sequence, or 

(B) of Legionella pneumophila Lens strain, characterized in that it is selected 

1 0 among: 

a) a nucleotidic sequence comprising at least one sequence having 80 % of 
identity with the sequences SEQ ID 6733 and 6734; 

b) a nucleotidic sequence hybridizing in very stringent conditions with the 
sequences SEQ ID 6733 and 6734; 

15 C) a nucl eotidic sequence complementing sequences SEQ ID 6733 and 

6734; or complementing a nucleotidic sequence such as defined in a), or b), or a 
corresponding nucleotidic sequence of RNA; and 

d) a nucleotidic sequence of at least 15 nucleotides of fragment 
representative of sequences SEQ ID 6733 and 6734; or of fragment representative of 

2 0 their sequence. 

More particularly, the object of the present invention likewise is the nucleotidic 
sequences characterized in that they originate from sequences SEQ ID 3507 and 3508 
and SEQ ID N° 1 to SEQ ID N° 56, and in that they code for a polypeptide selected 
from amongst the polypeptides of sequences SEQ ID 3509 to SEQ ID 6732, and SEQ 
ID N° 56 to SEQ ID N° 3455, preferably coding for a secreted enzyme likewise present 
in the Lens strain and Philadelphia strain of sequences SEQ ID 3675, 4267, 4292 and 
6477, preferably coding for a polypeptide present on the surface of Legionella 
pneumophila Paris strain of sequence SEQ ID Nos. 3410, 704, 746, 2267 2751 3192 
3218, 3221, 3222, 3317, 3324, 136, 171, 310, 337, 481, 527 652, 664, 893, 972,' 114 8 ' 
1298, 1361, 1503, 1521, 1576, 1651, 1755, 1847, 1877, 2224, 2406, 2843, 2930, 3037, 
3139, 3157, 3165, 3181, preferably coding for a polypeptide present on the specific 
surface of Legionella pneumophila Paris strain relative to the Philadelphia strain, 
especially of sequence SEQ ID Nos. 3410, 171,337,481,652, 1148, 1521,2843, 3037,' 
3181, or one of its representative fragments of at least 5 amino acids, or coding for a 
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po. W ep tl da lm phed in me biosynthesis of polyssacharides , 

2 ™ T ' " 26, 32 ' 8 ' 288 ' 917 ' ,503 ' ' 8 ". '»* 2204, 

2212, 2 43 2324, 2378, 24,0, 24, or again the nucleic sequences of the rtxA gene 

s 3?ra! d y 3 rr codins for the poiypep,ides ° f -~ seq ■>•*»• ^ 

2T.e present invention also relates more generally to the nueleotidic sequences 
denved fr 4. sequences seq ID Nos. 3507 and 3508, and SEQ ID N° I to SEQ ,D 
N 56, and codmg for a polypeptide of Le^nelU, piK u mophila Paris strai „ such ^ 

10 £ r S E Q ; r 5 r - ~ seq id nos - 3507 - - ■* - 

to addition, me nucleotide sequenees characterized in that Ihey comprise a 
nueleotidic sequence selected among: 

a) a nueleotidic sequence coding for a polypeptide selected from amongst the 
sequences SEQ ID Nos. 3509 to 6732, and SEQ ID N° 56 to SEQ ID N° 3455- 
l - b) a nueleotidic sequence comprising a, .ens, 80 %, 85 %, 90 % , 95 »/„ or 98 % of 
identity wm a nueleotidic sequence coding for a polypeptide selected from amongst J 
sequences SEQ ID Nos. 3509 ,0 6732, and SEQ ID N° 56 ,0 SEQ ID N- 3455- 
e) a nueleotidic sequence hybridizing in very stringent conditions with a 

20 SEoTn TnT COdta8 " P0,W>eP,ide *» — «- fences 

20 SEQ ID Nos. 3509 to 6732, and SEQ ID N° 56 to SEQ ID N° 3455- 

lb !1 r Pien,en,ary nUCle °" diC ° r RNA corresponding ,0 a sequence 

such as defined m a), b) or c); 

e) a nueleotidic sequence of a representative fragment having a. leas, 15 
nucleotides of a sequence such as defined in a) or d); and 
25 f) amodified nueleotidic sequence ofa sequence such as defined in a), d) ore) 
are likewise objects of the invention. 

Nucleic acid, nucleic sequence or nucleic acid, polynucleotide, oligonucleotide 
PC £uc,eo,,dic sequence, nueleotidic sequence, tern, which win be employed 
md fferently m «he presen, description, are understood ,o designate precise chaimng of 

' m ° dified " ^ ***** ' or a region of a nuJl 

acd comprising or no, non-nalural nucleotides, and able ,0 correspond jus, as wel. ,0 a 
double-sound DNA, a sing,e- s ,ra„d DNA as transcription products of said DNAs 

nUdeiC aCC ° rdmg '° * e '~ compass the 

PNA (Peptid Nucleic Acid), or similar. 
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It must be understood that the present invention does not relate to the nucleotidic 
sequences in their natural chromosomic environment, that is, in the natural state. These 
are sequences which were isolated and/or purified, that is they were sampled directly or 
indirectly, for example by copy, their environment having been at least partially 
modified. It is understood to likewise designate the nucleic acids obtained by chemical 
synthesis. 

« Percentage of identity » between two sequences of nucleic acids or amino 
acids in the sense of the present invention is understood to designate a percentage of 
nucleotides or residues of identical amino acids between the two sequences to be 
compared, obtained after the best alignment, this percentage being purely statistical and 
the differences between the two sequences being distributed randomly and over their 
entire length. "Best alignment" or - optimal alignment " is understood to designate the 
alignment for which the percentage of identity determined hereinafter is the highest. 
The comparisons of sequences between two sequences of nucleic acids or amino acids 
are traditionally made by comparing these sequences after they were aligned in optimal 
fashion, said comparison being made by segment or by « window of comparison » to 
identify and compare the local regions of similarity of sequence. The optimal alignment 
of the sequences for comparison can be made, apart from manually, by means of the 
local of de Smith and Waterman (1981, Ad. App. Math. 2:482), by means of the local 
2 0 homology algorithm of Neddleman and Wunsch (1 970, J. Mol. Biol. 48:443), by means 
of the similarity search method of Pearson and Lipman (1988, Proc. Natl. Acad. Sci. 
USA 85:2444), by means of software utilizing these algorithms (GAP, BESTFIT, 
BLAST P, BLAST N, FASTA and TFASTA in the Wisconsin Genetics Software 
Package, Genetics Computer Group, 575 Science Dr., Madison, WI). To obtain optimal 
25 alignment, the program BLAST is preferably used, with the BLOSUM 62 matrix. The 
PAM or PAM250 matrices can likewise be used. 

The percentage of identity between two sequences of nucleic acids or amino 
acids is determined by comparing these two sequences aligned optimally in which the 
sequence of nucleic acids or amino acids to be compared can comprise additions or 
30 deletions relative to the reference sequence for optimal alignment between these two 
sequences. The percentage of identity is calculated by determining the number of 
identical positions for which the nucleotide or the residue of amino acid is identical 
between the two sequences, by dividing this number of identical positions by the total 
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number of positions compared and by multiplying the result obtained by 100 to obtain 
the percentage of identity between these two sequences. 

Nucleic sequences having a percentage of identity of at least 80 %, preferably 
85 % or 90 %, more preferably 95 % or 98 % or 99 %, after optimal alignment with a 
5 reference sequence are understood to designate he nucleic sequences exhibiting, relative 
to the nucleic reference sequence, certain modifications such as in particular deletion, 
truncation, elongation, chimeric fusion and/or substitution, especially specific, and 
whereof the nucleic sequence has at least 80 %, preferably 85 %, 90 %, 95 %, 98 % or 
99 %, of identity after optimal alignment with the nucleic reference sequence. These are 

10 preferably sequences whereof the complementary sequences are capable of being 
hybridized specifically with the reference sequences. Preferably, the specific 
hybridization conditions or stringent conditions will be such that they ensure at least 80 
%, preferably 85 %, 90 %, 95 %, 98 % or 99 % of identity after optimal alignment 
between one of the two sequences and the sequence complementary to the other. 

15 Hybridization in very stringent conditions signifies that the temperature and 

ionic force conditions are selected in such a way that they permit hybridization to be 
maintained between two fragments of complementary DNA. By way of illustration, 
very stringent conditions of the hybridization stage for the purpose of defining the 
polynucleotide fragments described hereinabove, are advantageously the following. 

20 DNA-DNA or DNA-RNA hybridization is performed in two stages: (1) 

prehybridization at 42°C for 3 hours in phosphate buffer (20 mM, pH 7.5) containing 5 
x SSC (1 x SSC corresponds to a solution 0.15 M NaCl + 0.015 M sodium citrate), 
50% formamide, 7 % sodium dodecyl sulfate (SDS), 10 x Denhardt's, 5 % dextran 
sulfate and 1 % DNA salmon sperm; (2) hybridization per se for 20 hours at a 

2 5 temperature depending on the size of the probe (i.e.: 42°C, for a probe of size > 100 
nucleotides) followed by 2 washes of 20 minutes at 20°C in 2 x SSC + 2 % SDS, 1 
wash of 20 minutes at 20°C in 0.1 x SSC + 0.1 % SDS. The last wash is done in 0.1 x 
SSC + 0.1 % SDS for 30 minutes at 60°C for a probe of size > 100 nucleotides. The 
very stringent hybridization conditions described hereinabove for a polynucleotide of 

30 defined size can be adapted by the specialist for oligonucleotides of larger or smaller 
size, according to the teaching of Sambrook et a/., (1989, Molecular cloning: a 
laboratory manual. 2 nd Ed. Cold Spring Harbor). 

In addition, fragment representative of sequences according to the invention is 
understood to designate any nucleotide fragment having at least 15 nucleotides, 
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preferably at least 20, 25, 30, 50, 75, 100, 150, 300 and 450 consecutive nucleotides of 
the sequence from which it originates. 

Representative fragment is understood in particular to be a nucleic sequence 
coding for a biologically active fragment of a polypeptide, such as defined hereinbelow. 
5 Representative fragment is likewise understood to be the intergenic sequences, 

and in particular the nucleotidic sequences bearing the regulation signals (promoters, 
terminators, or enhancers, . . .), or again probe or primer sequences aiding in specifically 
detecting or amplifying the nucleic sequences coding for the polypeptides of sequences 
SEQ ID Nos, 3509 to 6732, and SEQ ID N° 56 to SEQ ID N° 3455. 

10 Of said representative fragments those are preferred having nucleotidic 

sequences corresponding to open reading frames, known as ORF sequences (ORF for 
« Open Reading Frame »), included in general between an initiation codon and a stop 
codon, or between two stop codons, and coding for polypeptides, preferably of at least 
100 amino acids, such as for example, without limiting them, the ORF sequences to be 

1 5 described hereinafter. 

The representative fragments according to the invention can be obtained for 
example by specific amplification such as PCR or after digestion by appropriate 
restriction enzymes of nucleotidic sequences according to the invention, this method 
being described in particular in the work of Sambrook et aL. Said representative 

2 0 fragments can likewise be obtained by chemical synthesis as long as their size is not too 
significant, according to methods well known to the specialist. 

The representative genome fragments of Legionella pneumophila Paris strain 
according to the invention likewise comprise at least one fragment of at least 15 
nucleotides or more as cited above for the fragments resulting from enzymatic cutting at 

2 5 the level of a restriction site. Of course, expression proteins such as RNA or proteins are 

understood according to the present invention. 

Among the sequences containing inventive sequences, or representative 
fragments, are likewise understood the sequences which are naturally framed by 
sequences which present at least 80 %, 85 %, 90 %, 95 % or 98 % of identity with the 

3 0 sequences according to the invention. 

Modified nucleotidic sequence is understood as any nucleotidic sequence 
obtained by mutagenesis according to techniques well known to the specialist, and 
comprising des, preferably a maximum 10 %, 7.5 %, 5 %, 2.5 %, 1 %, 0.5 %, 0.1 % or 
even less than 0.01 %, of modified nucleotides, relative to normal sequences, for 
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example mutations in the regulating and/or promoting sequences of the expression of 
the polypeptide, especially leading to modification of the rate of expression or activity 
of said polypeptide. 

Modified nucleotidic sequence is likewise understood as any nucleotidic 
5 sequence coding for a polypeptide modified such as defined hereinbelow. 

The representative fragments according to the invention can likewise be probes 
or primers, which can be utilized in processes for detection, identification, dosage or 
amplification of nucleic sequences. 

In a preferred manner the invention is relative to a nucleotidic sequence coding 
10 for a polypeptide according to the invention. 

In a preferred manner the invention is relative to a nucleotidic sequence 
according to the invention, characterized in that it codes for a specific polypeptide of a 
bacteria of the Legionella genre, or one of its fragments of at least 5 amino acids, or its 
complementary nucleic sequence. 
15 In a preferred manner the invention is relative to a nucleotidic sequence 

according to the invention, characterized in that it codes for a specific polypeptide of a 
pathogenic bacteria of the Legionella genre and/or of the Legionella pneumophila 
species, or one of its fragments of at least 5 amino acids, or its complementary nucleic 
sequence. 

2 0 In a preferred manner the invention is relative to a nucleotidic sequence 

according to the invention, characterized in that it codes for a specific polypeptide of a 
bacteria of the Legionella pneumophila species Paris strain, or one of its fragments of at 
least 5 amino acids, or its complementary nucleic sequence. 

In a preferred manner the invention is relative to a nucleotidic sequence 

2 5 according to the invention, characterized in that it codes for a specific polypeptide of a 

bacteria of the Legionella pneumophila species Paris strain relative to the Philadelphia 
strain, or one of its fragments of at least 5 amino acids, in particular selected from 
amongst the polypeptides of sequence SEQ ID Nos. 3410, 171, 337, 481, 652, 1148, 
1521, 2843, 3037, 3181 or one of its fragments of at least 5 amino acids, or its 

3 0 complementary nucleic sequence. 

In a preferred manner the invention is relative to a nucleotidic sequence 
according to the invention, characterized in that it codes for a specific polypeptide of a 
bacteria of the Legionella pneumophila species Paris strain relative to the Lens and 
Philadelphia strains, or one of its fragments of at least 5 amino acids, in particular 



WO 2005/049642 



PCT/IB2004/003578 



14 

selected from amongst the polypeptides whereof the sequences are indicated in Table 
XVII or one of their fragments of at least 5 amino acids, or their complementary nucleic 
sequence. 

In a preferred manner the invention is relative to a nucleotidic sequence 
5 according to the invention, characterized in that it codes for a surface polypeptide of 
Legionella pneumophila Paris strain, or one of its fragments of at least 5 amino acids, in 
particular selected from amongst the sequence polypeptides SEQ ID Nos. 3410, 704, 
746, 2267, 2751, 3192, 3218, 3221, 3222, 3317, 3324, 136, 171, 310, 337, 481, 527 
652, 664, 893, 972, 1148, 1298, 1361, 1503, 1521, 1576, 1651, 1755, 1847, 1877, 2224, 

10 2406, 2843, 2930, 3037, 3139, 3157, 3165, 3181, or one of its fragments of at least 5 
amino acids, or its complementary nucleic sequence. 

In a preferred manner the invention is relative to a nucleotidic sequence 
according to the invention, characterized in that it codes for a polypeptide of specific 
surface of Legionella pneumophila Paris strain relative to the Philadelphia strain, 

15 selected from amongst the polypeptides of sequences SEQ ID Nos. 3410, 171, 337, 481, 
652, 1 148, 1521, 2843, 3037, 3181, or one of its fragments of at least 5 amino acids, or 
its complementary nucleic sequence. 

In a preferred manner the invention is relative to a nucleotidic sequence 
according to the invention, characterized in that it codes for a polypeptide implied in the 

2 0 biosynthesis of polysaccharide having a cellular envelope of Legionella pneumophila 
Paris strain, in particular selected from amongst the polypeptides of sequence SEQ ID 
Nos. 1126, 3218, 288, 632, 917, 1503, 1555, 1877, 1928, 1963, 2204, 2212, 2243, 2324, 
2378, 2410, 241 1, or one of its representative fragments of at least 5 amino acids, or its 
complementary nucleic sequence. 

2 5 In a preferred manner the invention is relative to a nucleotidic sequence 

according to the invention, characterized in that it codes for a polypeptide of Legionella 
pneumophila Paris strain coded by the rtxA gene of sequence SEQ ID Nos. 3410, 3037, 
3165 and 3181, or one of its representative fragments of at least 5 amino acids, or its 
complementary nucleic sequence. 

3 0 The rtxA gene of Legionella pneumophila was demonstrated as being implied in 

virulence. This gene codes for a protein of 1208 aa. Four ORFs whereof the references 
are given hereinbelow (SEQ ID 3410 - 3037 - 3165 - 3181) correspond to this gene in 
the Paris strain which would code for a protein of at least 4000 aa. Comparison with the 
Philadelphia strain shows the presence of a homologous gene for the N- and C-terminal 
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parts. The central part of the protein is constituted by repetitions of a pattern of around 
20 aa different between the two strains. 

In another aspect, in a preferred manner, the invention is relative to a nucleotidic 
sequence according to the invention characterized in that it codes for a polypeptide of 
5 Legionella pneumophila Paris strain, or one of its representative fragments of at least 5 
amino acids implied in the biosynthesis of the amino acids, in the biosynthesis of the 
cofactors, prosthetic and transporter groups, implied in the cellular machinery, implied 
in the central intermediate metabolism, implied in the energetic metabolism, implied in 
the metabolism of fatty acids and phospholipids, implied in the metabolism of 

0 nucleotides, purines, pyrimidines or nucleosides, implied in the functions of regulation, 
implied in the process of replication, implied in the process of transcription, implied in 
the process of translation, implied in the process of transport and bonding of proteins, 
implied in adaptation to atypical conditions, implied in sensitivity to medications and 
the like, or implied in the functions relative to transposons. 

5 Owing to the genomic sequence presented in the present invention, the specialist 

will know how to identify the genes coding for proteins regulating transcription of the 
genes of Legionella pneumophila Paris strain. In addition, Table I provides the list of 
the open reading phases (ORF for « Open Reading Frame) annotated and identified on 
the genome of Legionella pneumophila Paris strain (SEQ ID N° 1 to SEQ ID N° 55), 

D with especially their position on said genome, and, in Table II, the putative functions 
which can be attributed to them by utilizing customary techniques for comparing the 
genomic (« Bestblast »). All the same, such a list must not be considered as limiting, 
where one protein can have several roles in the cell. 

Modifying the structure or the integrity of these genes could help modify the 

> expression of the target genes controlled by the target promoters of these regulators. 
Thus, the expert will be able to select the regulator(s) pertinent for the required 
application as well as their target, thus allowing optimization of the expression of genes 
of interest. The utilization of the tools described above such as the DNA chips, also 
registers all the genes whereof the regulation is modified by inactivation of certain 

1 genes. It is thus possible to select a set of control sequences responding to the same type 
of regulation. These sequences can then be used to control the expression of genes of 
interest. 

In general, the list of sequences SEQ ID, or their corresponding coding nucleic 
sequence could be determined by the specialist from the most probable putative 
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functions determined for each of the sequences SEQ ID in Tables I and XIV 
hereinbelow for each of the classes of activity classified hereinbelow. 

It is important to note all the same that a living organism is a whole and must be 
taken as such. Accordingly, so as to develop and exhibit its properties, any organism has 
need of interaction between the different metabolic paths. Therefore, the 
abovementioned classification must not be considered as limiting, with a gene able to be 
implied in two distinct metabolic paths. 

The invention likewise relates to polypeptides coded by a nucleotide sequence 
according to the invention, preferably by a fragment representative of the sequences 
SEQ ID Nos. 3509 to 6732, the sequence SEQ ID N° 55 or sequences SEQ ID N° 1 to 
SEQ ID N° 54, and SEQ ID N° 56, and corresponding to an ORE sequence, such as 
described in Table XIV (coding for one of the sequences SEQ ID N° 3509 to SEQ ID 
N° 6732), and in Table I (coding for one of the sequences SEQ ID N° 57 to SEQ ID 
N° 3455). 

In particular, polypeptides of Legionella pneumophila Paris strain, characterized 
in that they are selected from amongst the following polypeptides: 

- polypeptides of sequences SEQ ID Nos. 3509 to 6732 and of sequences SEQ ID 
N° 56 to SEQ ID N° 3455; 

- preferably enzymes secreted by Legionella pneumophila Paris, Lens and 
Philadelphia strains, especially sequences SEQ ID Nos. 3675, 4267, 4292 and 6477; 

- preferably polypeptides present on the surface of Legionella pneumophila Paris 
strain, especially of sequence SEQ ID Nos. 3410, 704, 746, 2267, 2751, 3192, 3218, 
3221, 3222, 3317, 3324, 136, 171, 310, 337, 481, 527 652, 664, 893, 972, 1148, 1298,' 
1361, 1503, 1521, 1576, 1651, 1755, 1847, 1877, 2224, 2406, 2843, 2930, 3037,' 3139,' 
3157, 3165, 3181, and still more preferred polypeptides present on the specific surface' 
of Legionella pneumophila Paris strain relative to the Philadelphia strain, especially 
those of sequences SEQ ID Nos. 3410, 171, 337, 481, 652, 1148, 1521, 2843, 3037, 
3181, or one of its representative fragments of at least 5 amino acids; 

- polypeptides implied in the biosynthesis of polysaccharides having a cellular 
envelope, especially of sequence SEQ ID Nos. 1126, 3218, 288, 632, 917, 1503, 1555 
1877, 1928,1963,2204,2212,2243,2324,2378,2410,2411; 

- or again polypeptides of sequence SEQ ID Nos. 3410, 3037, 3165 and 3181, 
coded by the rtxA gene. 
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The invention likewise comprises polypeptides characterized in that they 
comprise a polypeptide selected from amongst: 

a) a polypeptide of sequences SEQ ID Nos. 3509 to 6732, and of sequences 
SEQ ID N° 56 to SEQ ID N° 3455; 

5 b) a polypeptide having at least 80 % preferably 85 %, 90 %, 95 % and 

98 % of identity with an polypeptide of sequences SEQ ID Nos. 3509 to 6732, and of 
sequences SEQ ID N° 56 to SEQ ID N° 3455; 

c) a fragment of at least 5 amino acids, preferably biologically active, of 
one such as defined in b); 

10 d) a biologically active fragment of a polypeptide of sequence SEQ ID 

N° 56 to SEQ ID N° 3455; and 

e) a modified polypeptide of a polypeptide of sequences SEQ ID Nos 3509 
to 6732, and of sequences SEQ ID N° 56 to SEQ ID N° 3455, comprising at most 10 % 
modified amino acids, preferably 7.5 %, 5 %, 2.5 %, 1 o /o , 0 . 5 y 0 1 % Qr 

15 0.01 %. S 

The nucleotidic sequences coding for the abovedescribed polypeptides are 
likewise an object of the invention. 

In the present description, the terms polypeptides, polypeptide sequences, 
peptides and proteins are interchangeable. 

It must be understood that the invention does not relate to polypeptides in the 
natural form, that is, that they are not taken in their natural environment, rather they 
were isolated or obtained by purification from natural sources, or else obtained by 
genetic recombination, or by chemical synthesis, and that they can then comprise non- 
natural amino acids such as will be described hereinbelow. 

Polypeptide having a certain percentage of identity with another, likewise 
designated by homologous polypeptide, is understood to designate those polypeptides 
having, relative to natural polypeptides, certain modifications, in particular deletion 
addition or substitution of at least one amino acid, truncation, elongation, a chimeric 
solution and/or a mutation, or polypeptides having post-translational modifications Of 
the homologous polypeptides preference is given to those whereof the sequence of 
ammo acids has at least 80 %, preferably 85 %, 90 %, 95 %, 98 % and 99 % of identity 
with the sequences of amino acids of the polypeptides according to the invention. In the 
case of substitution, one or more consecutive or non-consecutive amino acid(s) are 
replaced by « equivalent » amino acids. The expression « equivalent amino acids» in 
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this case endeavors to designate any amino acid likely to be substituted for one of the 
amino acids of the base structure without however essentially modifying the biological 
activities of the corresponding peptides and such as they will be denned hereinbelow. 

These equivalent amino acids can be determined either by being supported on 
their structural homology with the amino acids for which they are substituted, or on 
results of comparative assays of biological activity between the different polypeptides 
capable of being effected. 

By way of example, mention is made of the possibilities of substitution for being 
carried out without the resulting extensive modification of the biological activity of the 
corresponding modified polypeptide. Thus leucine can be replaced by valine or 
isoleucine, aspartic acid by glutamine acid, glutamine by asparagine, arginine by lysine, 
etc., the inverse substitutions naturally being envisaged under the same conditions. 

The homologous polypeptides correspond likewise to the polypeptides coded by 
the homologous or identical nucleotide sequences, such as defined previously and thus 
comprise in the present definition mute polypeptides or corresponding to inter or intra 
species variations, able to exist in Legionella, and which correspond especially to 
truncations, substitutions, deletions and/or additions, of at least one residue of amino 
acids. 

It is understood that the percentage of identity between two polypeptides is 
calculated in the same way as between two sequences of nucleic acids. Thus, the 
percentage of identity between two polypeptides is calculated after optimal alignment of 
these two sequences, on a maximum homology window. To define said maximum 
homology window, the same algorithms as for the nucleic acid sequences can be 
utilized. 

Biologically active fragment of a polypeptide according to the invention is 
understood to mean in particular a fragment of polypeptide, such as defined 
hereinbelow, having at least one of the biological characteristics of the polypeptides 
according to the invention, especially in that it capable of exerting in general an even 
partial activity, such as for example: 

enzymatic (metabolic) activity or an activity able to be implied in the 
biosynthesis or biodegradation of organic or inorganic compounds; 

structural activity (cellular envelope, coping molecule, ribosome); 
transport activity (energy, ion); or in the secretion of protein; 
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activity in the process of replication, amplification, preparation, 
transcription, translation or maturation, especially of DNA, RNA or proteins. 

Fragment of polypeptides according to the invention is understood to mean a 
polypeptide comprising a minimum of 5 amino acids, preferably 6, 7, 8, 9, 10, 15, 20, 
5 25, 30, 40, 50, 75, 100, 150 acids and 200 amino acids. 

The fragments of polypeptides can correspond to isolated or purified fragments 
naturally present in the strains of Legionella, or to fragments which can be obtained by 
cleavage of said polypeptide by a proteolitic enzyme such as trypsine or chymotrypsine 
or collagenase, by a chemical reagent (cyanogen bromide, CNBr) or by placing said 
10 polypeptide in a highly acidic environment (for example at pH = 2.5). Polypeptide 
fragments can likewise be prepared by chemical synthesis, from hosts transformed by a 
vector of expression according to the invention which contains a nucleic acid allowing 
expression of said fragment, and placed under the control of the appropriate elements of 
regulation and/or expression. 
15 « Modified polypeptide» of a polypeptide according to the invention is 

understood to mean a polypeptide obtained by genetic recombination or by chemical 
synthesis such as described below, which has at least one modification relative to the 
normal sequence, preferably at most 10 % of amino acids modified relative to the 
normal sequence, preferably even at most 7.5 %, 5 %, 2.5 %, 1 %, 0.5 %, 0, 1 % or 
2 0 again 0.01 %. These modifications can be especially made to amino acids necessary for 
the specificity or efficacy of the activity, or at the origin of the structural conformation, 
the charge, or the hydrophobicity of the polypeptide according to the invention. 
Polypeptides of equivalent, augmented or diminished activity, or of equivalent, 
narrower or wider specificity can thus be created. Among the polypeptides modified, 
2 5 those polypeptides in which up to five amino acids can be modified, truncated at the N 
end or C terminal, or else deleted, or added, must be mentioned. 

As is indicated, the object of the modifications of a polypeptide especially are: 

to permit its usage in biosynthesis or biodegradation processes of organic 
or inorganic compounds, 
30 to permit its usage in processes of replication, amplification, repair and 

transcription, translation, or maturation especially of DNA, RNA, or proteins, 
to permit its improved secretion, 

to modify its solubility, efficacy or specificity of activity, or again to 
facilitate its purification. 
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Chemical synthesis likewise has the advantage of being able to utilize non- 
natural amino acids or non-peptidic bonds. Therefore, it can be interesting to utilize 
non-natural amino acids, for example in the D form, or analogs of amino acids, 
especially sulphurized forms. 
5 In another aspect, the invention is preferably relative to a polypeptide according 

to the invention, characterized in that it is a specific polypeptide of a bacteria of the 
Legionella genre, or one of its fragments of at least 5 amino acids. 

In another aspect, the invention is preferably relative to a polypeptide according 
to the invention, characterized in that it is a specific polypeptide of a pathogenic 
1 0 bacteria of the Legionella genre and/or of the Legionella pneumophila species, or one of 
its representative fragments of at least 5 amino acids. 

In another aspect, preferably, the invention is relative to a polypeptide according 
to the present invention, characterized in that it is a specific polypeptide of a bacteria of 
the species Legionella pneumophila Paris strain, or one of its representative fragments 
15 of at least 5 amino acids. 

In another aspect, preferably, the invention is relative to a polypeptide according 
to the present invention, characterized in that it is a specific polypeptide of a bacteria of 
the Legionella pneumophila species Paris strain relative to the Philadelphia strain, or 
one of its representative fragments of at least 5 amino acids, in particular selected from 
20 amongst the polypeptides of sequences SEQ ID Nos. 3410, 171, 337, 481, 652, 1148, 
1 52 1 , 2843, 3037, 3 1 8 1 , or one of its representative fragments of at least 5 amino acids. 

In another aspect, preferably, the invention is relative to a polypeptide according 
to the present invention, characterized in that it is a specific polypeptide of a bacteria of 
the Legionella pneumophila species Paris strain relative to the Lens and Philadelphia 
25 strains, or one of its representative fragments of at least 5 amino acids, in particular 
selected from amongst the polypeptides whereof the sequence is indicated in Table 
XVII, or one of their representative fragments of at least 5 amino acids. 

In another aspect, preferably, the invention is relative to a polypeptide according 
to the present invention, characterized in that it is a surface polypeptide of Legionella 
30 pneumophila Paris strain, or one of its representative fragments of at least 5 amino 
acids, in particular selected from amongst the polypeptides of sequence SEQ ID Nos. 
3410, 704, 746, 2267, 2751, 3192, 3218, 3221, 3222, 3317, 3324, 136, 171, 310, 337, 
481, 527 652, 664, 893, 972, 1148, 1298, 1361, 1503, 1521, 1576, 1651, 1755, 1847, 
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1877, 2224, 2406, 2843, 2930, 3037, 3139, 3157, 3165, 3181, or one of its 
representative fragments of at least 5 amino acids. 

In another aspeet, preferably, the invention is relative to a polypeptide according 
to the present invention, characterized in that it is a polypeptide of specific surface of 
> Legionella pneumophila Paris strain relative to the Philadelphia strain, selected from 
amongst the polypeptides of sequence SEQ ID Nos. 3410, 171, 337 481 652 1148 
1521, 2843, 3037, 3181, or one of its representative fragments of at least 5 amino'acids ' 
In another aspect, preferably, the invention is relative to a polypeptide according 
to the mvention, characterized in that it is a polypeptide Legionella pneumophila Paris 
stram implied in the biosynthesis of polysaccharide having a cellular envelope of 
Le gl onella pneumophila Paris strain, in particular selected from amongst the 
polypeptides of sequence SEQ ID Nos. 1126, 3218, 288, 632, 917, 1503 1555 1877 

1928, 1963,2204, 2212,2243 2324 237S ?4m Ozti i 

, ziz% 2410, 2411, or one of its representative 

fragments of at least 5 amino acids. 

In another aspect, preferably, the invention is relative to a polypeptide according 
to the invention, characterized in that it is a polypeptide de Legionella pneumophila 
Pans strain coded by the rtxA gene de sequence SEQ ID Nos. 3410, 3037 3165 and 
3181, or one of its representative fragments of at least 5 amino acids, or its 
complementary nucleic sequence. 

In another aspect, preferably, the invention is relative ,o a polypeptide according 
to the invention, characterized in tha, i, is a polypeptide of U^Ua pneu^pMa 
Parts strain, or one of its representative fragments of a, leas, 5 amino acids, implied in 
me btosyrtthesis of amino acids, in the biosynthesis of cofactors, prosthetic and 
transporter groups, implied in the cellular machinery, implied in the centra, 
tntermediary metabolism, implied in the energetic metabolism, imp ,i e d i„ the 
metabolism of fatty acids and phospholipids, implied in the metabolism of nucleotides 
punns, pyrimidins or nucleosides, implied in the functions of regulation, implied in the 
process of replication, implied in the process of transcription, implied in the process of 
translation, implied in the process of transport and binding of proteins, implied in the 
adaptation ,o atypical conditions, implied in the sensitivity to drugs and the like or 
implied m the functions relatives to transposons. 

The object of the present invention is likewise the nucleotidic sequences and/or 
polypeptides according ,o fhe invention, chamcterized in fha. said sequences are 
registered on a registration support whose fonn and natute facilitate reading, analysis 
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and/or exploitation of said sequence(s). These supports can likewise contain other 
information extracted from the present invention, especially the analogies with already 
known sequences, and/or information concerning the nucleotidic sequences and/or 
polypeptides of other microorganisms for the purpose of facilitating comparative 
5 analysis and exploitation of the results obtained. 

Among said registration supports, preference is given in particular to those 
supports readable by a computer, such as magnetic, optic, electric or hybrid supports, in 
particular information discs, CD-ROM, servers. Such registration supports are likewise 
an object of the invention. 

10 The registration supports according to the invention, with the information 

contributed, are very useful for the choice of primers or nucleotidic probes for 
determining genes in Legionella pneumophila Paris strain or strains close to this 
organism. Similarly, the utilization of these supports for the study of genetic 
polymorphism of a strain close to Legionella pneumophila Paris strain, in particular by 

1 5 determination of the colinearity regions, is very useful as far as these supports provide 
not only the nucleotidic sequence of the genome of Legionella pneumophila Paris strain, 
but likewise the genomic organization in said sequence. Thus, the uses of registration 
supports according to the invention are likewise objects of the invention. 

The homology analysis between different sequences is completed in effect 

2 0 advantageously by means of software for sequence comparisons, such as BlastP or 
BlastN software, or other software well known to the specialist. 

A probe or primer is defined, in the sense of the invention, as being a fragment 
of single-strand nucleic acids or a denatured double-strand fragment comprising for 
example 12 bases at several kb, especially 15 at several hundreds of bases, preferably 

2 5 from 15 to 50 or 100 bases, and possess a hybridization specificity in conditions 
determined to form a hybridization complex with a nucleic acid target. 

The probes and primers according to the invention can be marked directly or 
indirectly by a radioactive or non-radioactive compound using methods well known to 
the specialist, to obtain a detectable and/or quantifiable signal. 

30 The non-marked sequences of polynucleotides according to the invention can be 

utilized directly as probe or primer. 

The sequences are generally marked to obtain sequences utilizable for numerous 
applications. The marking of the primers or probes according to the invention is done by 
radioactive elements or by non-radioactive molecules. 
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Examples of the radioactive isotopes used are 32 P, 33 P, 35 S, 3 H or 125 I. The non- 
radioactive entities are selected among ligands such as biotin, avidin, streptavidin, 
dioxygenin, haptenes, colorants, luminescent agents such as radioluminescent, 
chemoluminescent, bioluminescent, fluorescent, phosphorescent agents. 
5 The polynucleotides according to the invention can thus be utilized as primer 

and/or probe in processes in particular making use of the PCR technique (amplification 
in chain by polymerase) (Rolfs et al, 1991, Berlin: Springer- Verlag). This technique 
requires the choice of pairs of oligonucleotide primers framing the fragment to be 
amplified. For example, reference can be made to the technique described in the U.S. 

10 patent N° 4,683,202. The amplified fragments can be identified, for example after 
electrophoresis in agarose or polyacrylamide gel, or according to a chromatographic 
technique such as filtration on gel or ion exchange chromatography, then sequenced. 
The specificity of the amplification can be controlled by using as primer the inventive 
nucleotidic sequences of polynucleotides as matrix, plasmids containing these 

15 sequences or even the derived amplification products. The amplified nucleotide 
fragments can be utilized as reagents in hybridization reactions so as to reveal the 
presence, in a biological sample, of a nucleic acid target of sequence complementary to 
those of said amplified nucleotide fragments. 

The aim of the invention is likewise the nucleic acids capable of being obtained 

2 0 by amplification by means of primers according to the invention. 

Other amplification techniques for the nucleic acid target can be advantageously 
employed as an alternative to PCR (PCR-like) by means of a couple of primers of 
nucleotidic sequences according to the invention. PCR-like is understood to mean all 
the methods making use of direct or indirect reproductions of the sequences of nucleic 

2 5 acids, or else in which the marking systems were amplified; these techniques are well 
known, and in general are amplification of DNA by a polymerase; when the original 
sample is a RNA reverse transcription should be previously carried out. There are 
currently numerous processes enabling this amplification, such as for example the SDA 
technique (Strand Displacement Amplification) or brine displacement amplification 

30 technique (Walker et al, 1992, Nucleic Acids Res. 20:1691), the TAS technique 
(Transcription-based Amplification System) described by Kwoh et al (1989, Proc. Natl. 
Acad. Sci. USA, 86:1173), the 3SR technique (Self-Sustained Sequence Replication) 
described by Guatelli et al (1990, Proc. Natl. Acad. Sci. USA 87:1874), the NASBA 
technique (Nucleic Acid Sequence Based Amplification) described by Kievitis et al 
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(1991, J. Virol. Methods, 35:273), the TMA technique (Transcription Mediated 
Amplification), the LCR technique (Ligase Chain Reaction) described by Landegren et 
al (1988, Science 241:1077), the RCR technique (Repair Chain Reaction) described by 
Segev (1992, Kessler C. Springer Verlag, Berlin, New-York, 197-205), the CPR 
5 technique (Cycling Probe Reaction) described by Duck et al. (1990, Biotechniques, 
9:142), the Q-beta-replicase amplification technique described by Miele et al. (1983, J. 
Mol. Biol., 171 :281). Certain of these techniques have since been refined. 

In the event where the polynucleotide target to be detected is a RNAm, prior to 
use of an amplification reaction by means of the primers according to the invention or 
1 0 prior to use of a detection process by means of the inventive probes, an enzyme of 
inverse transcriptase type is used advantageously in order to obtain a DNAc from the 
RNAm contained in the biological sample. The DNAc obtained will then serve as target 
for the primers or the probes used in the process of amplification or detection according 
to the invention. 

15 The technique of hybridization of probes can be executed in various ways 

(Matthews et al, 1988, Anal. Biochem., 169:1-25). The most general method consists 
of immobilizing the nucleic acid extracted from cells of different tissues or cells in 
culture on a support (such as nitrocellulose, nylon, polystyrene) and of incubating, in 
well-defined conditions, the nucleic acid target immobilized with the probe. After 

2 0 hybridization, the probe excess is eliminated and the hybrid molecules formed are 
detected by the appropriate method (measuring of radioactivity, fluorescence or 
enzymatic activity associated with the probe). 

In accordance with another operating mode of the nucleic probes according to 
the invention, the latter can be utilized as capture probes. In this case, a probe, known as 

2 5 «capture probe», is immobilized on a support and serves to capture via specific 
hybridization the nucleic acid target obtained from the biological sample to be tested 
and the nucleic acid target is then detected due to a second probe, known as «detection 
probe», marked by an easily detectable element. 

Of the possibly interesting fragments of nucleic acids, anti-sense 

30 oligonucleotides should thus be cited in particular, that is, whereof the structure ensures, 
via hybridization with the sequence target, inhibition of the expression of the 
corresponding product. Sense oligonucleotides which, by interaction with proteins 
implied in regulating the expression of the corresponding product, will cause either 
inhibition or activation of this expression, should likewise be cited. 
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In a preferred way the probes or primers according to the invention are 
immobilized on a support, covalentiy or non-covalently. In particular, the support can 
be a DNA chip or a high-density filter, likewise objects of the present invention. 

The interest in using DNA chips or, if required, protein chips in the domain of 
5 diagnostics or epidemiology rests as mentioned previously on the possibility of 
analyzing a large number of sequences at the same time and very rapidly, for: 

- classification or typing of bacteria as a function of the presence of a sequence 
or of a profile of sequences characteristic of the genre, especially of the pathogenicity or 
not of the genre, especially Legionella, or of the species, especially Legionella 

10 pneumophila, or specific to a bacteria of the Legionella genre and/or of the Legionella 
pneumophila species, or specific to a bacteria of the Legionella pneumophila sub- 
species Paris strain, or specific to a bacteria of the Legionella pneumophila sub-species 
Paris strain relative to the Philadelphia strain and/or Lens strain, or even specific to a 
bacteria of the Legionella pneumophila sub-species Paris and Lens strain relative to the 

1 5 Philadelphia strain, in particular, in association with the gravity or not of the pathologies 
which such bacteria can induce in case of infection in mammals, especially in humans; 
or 

- simultaneous comparison of sequence or of profile of sequences between 
different genres, species or strain of bacteria, pathogenic or not, allowing especially 

2 0 identification of a gene, or the corresponding proteic sequence, or a profile of genes 
whereof the presence and/or the expression in a bacteria is specific according to its 
genre, its species or its sub-species or strain of bacteria, and/or its pathogenicity or not. 
This information is largely useful especially for rapidly identifying the presence or not 
of a pathogenic bacteria, the gravity of the infection it can cause, the treatment adapted 

25 to an infection, and/or the necessity and the means for implementing contaminated 
circuits or fluids or able to be contaminated for decontaminating the objects. This 
information will likewise be largely useful to epidemiological studies relative to this 
genre of bacteria. 

DNA chip or high-density filter is understood to mean a support on which DNA 
30 sequences are fixed, each of them able to be marked by its geographic location. These 
chips or filters differ principally in their size, the material of the support, and possibly 
the number of DNA sequences fixed thereto. 

The probes or primers according to the first invention can be fixed on solid 
supports, in particular the DNA chips, by means of different fabrication processes. In 
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particular, in situ synthesis can be carried out by photochemical addressing or via ink 
jet. Other techniques consist of carrying out ex situ synthesis and fixing the probes on 
the support of the DNA chip by mechanical or electronic addressing, or by ink jet. 
These different processes are well known to the specialist. 
5 In effect, numerous techniques or devices for analysis of biological samples 

were developed in recent years, in particular for parallel analysis of several quantities of 
nucleic acids, especially following the development of the genomic. 

Among these techniques or devices, the supports enabling high-rate analysis of 
nucleic acids, such as biochips, or DNA chips (also called « micro- or macroarrays », or 
1 0 even « DNA chip ») were the object of numerous studies. 

These biochips can be made in particular from a support, generally solid and 
functionalized, on which given nucleic acids (nucleic probes) were fixed by covalent 
bond and localized, and on which nucleic probes are fixed specifically respectively by 
matching (or specific hybridization) or by recognition of an affinity site of the nucleic 
1 5 acids which are to be detected or identified in the biological sample. 

Particular examples of the documents describing techniques relative to bioDNA 
chips are: 

- the review article by Wang J. (Nucleic Acids Research, 28, 16:3011-3016, 
2000), which has an abstract making the point on the main known techniques relative to 

2 0 DNA chips, 

- the patent document issued under N° US 6,030,782, which describes grafting 
with a mercaptosilanized surface, of nucleic acids modified by a sulhydryl or disulfide 
group, and the article by Bamdad (Biophysical Journal, 75:1997-2003, 1998), which 
describes obtaining surfaces having DNA by incorporation of composite molecules, 

2 5 DNA-thiols, in auto-assembled monolayers (« self-assembled monolayers or S AMs »); 

- the international patent application published under N° W0 00/43539 which 
proposes immobilizing molecules, such as oligonucleotides, by means of polyfunctional 
polymers (« polymer brushes ») thus enabling the grafting density to be increased. 
These polymers can be obtained from hydroxyethyl, acrylamide methacrylate, or vinyl 

3 0 pyrrolidone; 

- the international patent application published under N° WO 00/36145 
describes a fabrication method of DNA chips, comprising polymerization on a substrate 
of metallic layer type, a copolymer of pyrrol and functionalized pyrrol, fixing a 
reticulation agent on the functionalized pyrrol, then fixing a biological probe (such as an 
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oligonucleotide). The reticulation agent can be Afunctional, and for example have an 
ester function of the N-hydroxysuccinimide and a maleimide function; 

-" the international patent application published under N° WO 98/20020 which 
likewise describes the high-density immobilization of nucleic acids on solid supports, 
5 this time by placing in contact of a nucleic acid containing a thiol group with support 
having a group reacting with this thiol, possibly by way of a reticulation agent; 

- the article by Penchovsky et al (Nucleic Acids Research, 28, 22, e98, 2000), 
which describes a method for immobilization of oligonucleotides on aminated balls of 
latex, by means of a reticulation agent which reacts under the action of light; and 
10 - the international patent applications published under N° WO 99/16907, 

WO 00/40593 and WO 00/44939 filed by the company Surmodics (which produces 
lames for registering oligonucleotides functionalized with an amine). These applications 
describe especially the fixing of nucleic acids on surfaces such as glass, by way of a 
polymer skeleton to which one or more « photochemically active » groups are fixed on 
1 5 one side of the polymer (for grafting on the surface) and « thermochemically active » on 
the other side (for grafting with the functionalized nucleic acid). 

A nucleotidic sequence (probe or primer) according to the invention thus enables 
detection and/or amplification of specific nucleic sequences. In particular, detection of 
said sequences is made easier when the probe is fixed on a DNA chip, or to a high- 
2 0 density filter. 

The utilization of DNA chips or of high-density filters in effect helps determine 
expression of genes in an organism having a genomic sequence close to the genome of 
Legionella pneumophila Paris strain (Collection of the Pasteur Institute CIP 107-629-T). 
The genomic sequence of Legionella pneumophila Paris strain, completed by 
25 identification of all the genes of this organism, such as presented in the present 
invention, serves as a base for the construction of these DNA chips or filter. 

The preparation of these filters or chips consists of synthesizing 
oligonucleotides, corresponding to the 5' and 3' ends of the genes. These 
oligonucleotides are selected by utilizing the genomic sequence and its annotations 
30 divulged by the present invention. The matching temperature of these oligonucleotides 
at the corresponding places on the DNA must be approximately the same for each 
oligonucleotide. This aids in preparing fragments of DNA corresponding to each gene 
by the utilization of condition of appropriate PCR in a highly automated environment. 
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The amplified fragments are then immobilized on filters or supports made of glass, 
silicon or synthetic polymers and these media are utilized for hybridization. 

The availability of such filters and/or chips and of the annotated corresponding 
genomic sequence allows study of the expression of large ensembles, or of the totality 
5 of genes in the microorganisms of the Legionella genre or of the Legionella 
pneumophila species, by preparing the complementary DNA, and by hybridizing it to 
the DNA or to the oligonucleotides immobilized on the filters or the chips. Likewise, 
the filters and/or the chips allow study of the variability of the strains or species, by 
preparing the DNA of these organisms and by hybridizing it to the DNA or to the 
1 0 oligonucleotides immobilized on the filters or the chips. 

The differences between the genomic sequences of the different strains or 
species can extensively affect the intensity of the hybridization and, as a consequence, 
perturb interpretation of the results. It can thus be necessary to have the precise 
sequence of the genes of the strain to be studied. The method for detecting the genes 
described in detail hereinbelow, implying determination of the sequence of random 
fragments of a genome, and organizing it according to the sequences of the complete 
genome of the Legionella pneumophila strain divulged in the present invention, can be 
very useful. 

The nucleotide, or proteic, sequences according to the invention can be likewise 
utilized in DNA chips, or, if required, protein chips for performing mutation analysis. 
This analysis is based on the constitution of chips, especially DNA chips, capable of 
analyzing each base of a nucleotidic sequence according to the invention. For this 
purpose the techniques of micro-sequencing on DNA chip especially could be used. The 
mutations are detected by extension of immobilized primers hybridizing analyzed 
sequences to the matrix, just in a position adjacent to that of the desired mute 
nucleotide. A single-strand matrix, RNA or DNA, of the sequences for analysis will be 
advantageously prepared according to classic methods, from products amplified 
according to PCR-type techniques. The matrices of single-strand DNA, or RNA thus 
obtained are then deposited on the DNA chip, in conditions enabling their specific 
3 0 hybridization to the immobilized primers. A thermostable polymerase, for example Tth 
or the Taq DNA polymerase, specifically extends the end 3' of the primer immobilized 
with an analog of complementary marked nucleotide of the nucleotide in a variable 
position of the site; for example a thermal cycle is created in the presence of fluorescent 
dideoxyribonucleotides. The experimental conditions will be adapted especially to the 



20 



25 



WO 2005/049642 



PCT/IB2004/003578 



29 

chips employed, to the immobilized primers, to the polymerases employed, and to the 
marking system selected. An advantage of microsequencing, relative to techniques 
based on hybridization of probes, is that it allows all the variable nucleotides to be 
identified with optimal discrimination in conditions of homogeneous reactions; when 
5 used on DNA chips, it permits optimal resolution and specificity for routine and 
industrial detection of mutations in multiplex. 

The utilization of high-density filters and/or chips thus provides new knowledge 
on the regulation of genes in organisms of industrial importance, and in particular the 
legionelloses propagated in diverse conditions. It also allows rapid identification of the 
10 differences between the genome of the strains utilized in multiples industrial 
applications. 

In addition, a DNA chip or a filter can be an interesting extremely tool for 
determination, detection and/or identification of a microorganism. Therefore, the DNA 
chips according to the invention which further contain at least one nucleotidic sequence 

15 of a microorganism other than Legionella pneumophila Paris strain, immobilized on the 
support of said chip are likewise preferred. Preferably, the selected microorganism is 
selected from among the species of bacteria of the Legionella genre (hereinbelow 
designated at times as bacteria associated with Legionella pneumophila), or the sub- 
species of Legionella pneumophila, or again the variants of Legionella pneumophila 

2 0 Paris strain. 

A DNA chip or a filter according to the invention is a very useful element of 
certain kits or is necessary for the detection and/or identification of microorganisms, in 
particular the bacteria belonging to the Legionella pneumophila species Paris strain or 
the associated microorganisms, likewise objects of the invention. 

2 5 Besides, the DNA chips or the filters according to the invention, containing 

probes or specific primers of Legionella pneumophila Paris strain, compared in 
particular to the Philadelphia strain, are highly advantageous elements of kits or 
necessary for the detection and/or quantification of the expression of genes of 
Legionella pneumophila Paris strain (or of associated microorganisms). 

30 ^ eff ect, control of the expression of genes is a critical point for optimizing the 

growth and yield of a strain, either allowing the expression of one or more novel genes, 
or by modifying the expression of genes already present in the cell. The present 
invention provides all the sequences naturally active in Legionella pneumophila Paris 
strain enabling the expression of the genes. It thus allows all the sequences, expressed in 
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Legionella pneumophila Paris strain to be determined. It likewise provides a tool for 
locating the genes whereof the expression follows a given pattern. To achieve this, the 
DNA of all or part of the genes of Legionella pneumophila Paris strain can be amplified 
thanks to primers according to the invention, then fixed to a support, such as for 
5 example glass or nylon or a DNA chip, so as to construct a tool allowing the expression 
profile of these genes to be followed. This tool, constituted by this support containing 
the coding sequences serves as hybridization matrix to a mixture of marked molecules 
reflecting the RNA messengers expressed in the cell (in particular the probes marked 
according to the invention). By repeating this experience at different instants and by 

1 0 combining all these data via suitable processing, the expression profiles of all these 
genes are obtained. The knowledge of the sequences which follow a given regulation 
pattern can also be of benefit for direct searching, for example by homology, for other 
sequences following globally, but slightly different for the same regulation pattern. By 
way of complement, it is possible to isolate each control sequence present upstream of 

15 the segments serving as probes and to follow their activity by means of appropriate 
means such as a reference gene (luciferase, P-galactosidase, GFP for « Green 
Fluorescent Protein »). These isolated sequences can then be modified and assembled 
by metabolic engineering with sequences of interest with a view to their optimal 
expression. 

2 0 The aim of the invention likewise is the cloning and/or expression vectors, 

which contain a nucleotidic sequence according to the invention. In particular, those 
nucleotidic sequences are preferred which code for polypeptides having a cellular 
envelope or surface, or implied in the cellular machinery, in particular secretion, central 
intermediary metabolism, in particular production of sugar, energetic metabolism, the 

2 5 synthesis process of Vitamin B12, transcription and translation, synthesis of 
polypeptides. 

The vectors according to the invention preferably comprise elements which 
permit the expression and/or secretion of the nucleotidic sequences in a determined host 
cell. 

30 The vector must comprise a promoter, signals for initiation and termination of 

translation, thus of the appropriate regulation regions of transcription. It must be able to 
be kept stable in the host cell and can possibly have particular signals which specify 
secretion of the translated protein. These different elements are selected and optimized 
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by the specialist as a function of the cellular host utilized. To this effect, the nucleotide 
sequences according to the invention can be inserted into autonomous replication 
vectors within the selected host, or they can be integrative vectors of the selected host. 

Such vectors are prepared by methods currently utilized by the specialist, and 
the resulting clones can be introduced to an appropriate host using standard methods, 
such as lipofection, electroporation, thermal choc, or chemical methods. 

The vectors according to the invention are for example vectors of plasmidic or 
viral origin. They are useful for transforming host cells so as to clone or express the 
nucleotidic sequences according to the invention. 

The invention likewise comprises the host cells transformed by a vector 
according to the invention. 

The cellular host can be selected from amongst prokaryotic or eukaryotic 
systems, for example bacterial cells, but likewise yeast cells or animal cells, in 
particular the cells of mammals. The cells of insects or plant cells can likewise be used 
here. The host cells preferred according to the invention are in particular prokaryotic 
cells, preferably bacteria such as E. coli, or again belonging to the Legionella genre, to 
the Legionella pneumophila species Paris strain, or microorganisms associated with the 
Legionella pneumophila species Paris strain. 

The invention relates likewise to animals, except human, which comprise a cell 
transformed according to the invention. The transformed cells according to the 
invention are utilizable in preparation processes for recombinant polypeptides according 
to the invention. The preparation processes of a polypeptide according to the invention 
in recombinant form, characterized in that they utilize a vector and/or a cell transformed 
by a vector according to the invention are themselves included in the present invention. 
Preferably, a cell transformed by a vector according to the invention is cultivated in 
conditions permitting expression of said polypeptide and said recombinant peptide is 
recovered. The host cells according to the invention can likewise be utilizes for 
preparation of nutritive compositions, themselves an object of the present invention. 

As was mentioned, the cellular host can be selected from amongst prokaryotic or 
eukaryotic systems. In particular, it is possible to identify nucleotidic sequences 
according to the invention, facilitating secretion in such a prokaryotic or eukaryotic 
system. A vector according to the invention carrying such a sequence can thus be used 
advantageously for production of recombinant proteins, to be secreted. In fact, 
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purification of these recombinant proteins of interest will be facilitated by the fact that 
they are present in the supernatant of the culture cellular rather than inside the host cells. 

The polypeptides according to the invention can likewise be prepared by 
chemical synthesis. Such a preparation process is likewise an object of the invention. 
5 The specialist is aware of synthesis chemical processes, for example techniques utilizing 
solid phases (see especially Steward et aL y 1984, Solid phase peptides synthesis, Pierce 
Chem. Company, Rockford, 111, 2nd ed., (1984)) or techniques utilizing partial solid 
phases, by condensing fragments or by synthesis in classic solution. The polypeptides 
obtained by chemical synthesis and capable of comprising corresponding non-natural 
1 0 amino acids are likewise included in the invention. 

The invention is further relative to hybrid polypeptides having at least one 
polypeptide or one of its fragments according to the invention, and a sequence of a 
polypeptide for inducing an immune response in humans or animals. 

Advantageously, the antigenic determinant is such that it is capable of inducing 
15 a humoral and/or cellular response. 

Such a determinant could comprise a polypeptide or one of its fragments 
according to the invention in glycosylated form utilized with a view to obtaining 
immunogenic compositions capable of inducing synthesis of antibodies directed against 
multiple epitopes. Said polypeptides or their glycosylated fragments likewise part of the 
2 0 invention. 

These hybrid molecules can be made up in part by a carrier molecule of 
polypeptides or of their fragments according to the invention, associated with a possibly 
immunogenic part, in particular an epitope of the diphtheric toxin, tetanic toxin, a 
surface antigen of the hepatitis B virus (patent FR 79 21811), the antigen VP1 of the 

2 5 poliomyelitus virus or any other toxin or viral or bacterial antigen. 

The synthesis processes of hybrid molecules encompass the methods utilized in 
genetics to construct hybrid nucleotidic sequences coding for the desired polypeptide 
sequences. For example, reference can be made advantageously to the technique for 
obtaining genes coding for fusion proteins described by Minton in 1984. 

3 0 Said hybrid nucleotidic sequences coding for a hybrid polypeptide, as well as the 

hybrid polypeptides according to the invention characterized in that these are 
recombinant polypeptides obtained by the expression of said hybrid nucleotidic 
sequences, are likewise part of the invention. 
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The invention likewise comprises the vectors characterized in that they contain 
one of said hybrid nucleotide sequences. The host cells transformed by said vectors, the 
transgenic animals comprising one of said transformed cells as well as the preparation 
processes for recombinant polypeptides utilizing said vectors, said transformed cells 
and/or said transgenic animals are likewise naturally part of the invention. 

The coupling between a polypeptide according to the invention and an 
immunogenic polypeptide can be made chemically, or biologically. Therefore, 
according to the invention, it is possible to introduce one or more binding element(s),' 
especially amino acids for facilitating the coupling reactions between the polypeptide' 
according to the invention, and the immunostimulator polypeptide, the covalent 
coupling of the immunostimulator antigen able to be formed at the N or C-terminal end 
of the polypeptide according to the invention. The bifunctional reagents allowing this 
coupling are determined as a function of the end selected for making this coupling, and 
the coupling techniques are well known to the specialist. 

The conjugates originating from a coupling of peptides can likewise be prepared 
by genetic recombination. The (conjugated) hybrid peptide can in effect be produced by 
recombinant DNA techniques, by insertion in or addition to the DNA sequence coding 
for the polypeptide according to the invention, of a sequence coding for the antigenic, 
immunogenic or haptenic peptide(s). These preparation techniques for hybrid peptides 
by genetic recombination are well known to the specialist (see for example Makrides, 
1996, Microbiological Reviews 60:512-538). 

Preferably, said immune polypeptide is selected in the group of peptides 
containing anatoxins, especially diphteric toxoid or tetanic toxoid, the proteins derived 
from Streptococcus (as the binding protein to human seralbumin), the membranous 
OMPA proteins and the complexes of proteins of external membranes, the vesicles of 
external membranes or the proteins of thermal shocks. 

The hybrid polypeptides according to the invention are very useful to obtain 
monoclonal or polyclonal antibodies, capable of specifically recognizing the 
polypeptides according to the invention. In effect, a hybrid polypeptide according to the 
invention allows potentiation of the immune response, against the polypeptide 
according to the invention coupled to the immunogenic molecule. Such monoclonal or 
polyclonal antibodies, their fragments, or chimeric antibodies, recognizing the 
polypeptides according to the invention, are likewise objects of the invention. 
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The specific monoclonal antibodies can be obtained according to the classic 
method of hybridome culture described by Kohler and Milstein (1975, Nature 256, 
495). 

The antibodies according to the invention are for example chimeric antibodies, 
5 humanized antibodies, Fab, or F(ab') 2 fragments. It can likewise be in the form of 
immunoconjugate or antibodies marked so as to provide a detectable and/or quantifiable 
signal. 

Therefore, the antibodies according to the invention can be employed in a 
process for detection and/or identification of bacteria belonging to the Legionella 
10 pneumophila species Paris and/or Lens strain and/or Philadelphia, or to an associated 
microorganism in a biological sample, characterized in that it comprises the following 
stages: 

a) contact of the biological sample with an antibody according to the invention; 

b) evidence of the possibly formed antigen-antibody complex. 

15 The antibodies according to the present invention are likewise utilizable for 

detecting an expression of a gene of Legionella pneumophila Paris and/or Lens strain 
and/or Philadelphia strain, or of associated microorganisms. In effect, the presence of 
the expression product of a gene recognized by a specific antibody of said product 
expression can be detected by the presence of an antigen-antibody complex formed after 

20 contact of the strain of Legionella pneumophila Paris, Lens or Philadelphia strain, or of 
the associated microorganism with an antibody according to the invention. The bacterial 
strain utilized can were « prepared », that is, centrifuged, lysated, placed in an 
appropriate reagent for the constitution of the medium prone to immunological reaction. 
In particular, a detection process for expression in the gene is preferred, corresponding 

2 5 to a Western blot, capable of being carried out after electrophoresis on polyacrylamide 
gel of a lysate of the bacterial strain, in the presence or in the absence of reductory 
conditions (SDS-PAGE). After migration and separation of the proteins on the 
polyacrylamide gel, said proteins are transferred to an appropriate membrane (for 
example made of nylon) and the presence of the protein or of the polypeptide of interest 

30 is detected, by contact of said membrane with an antibody according to the invention. 

Therefore, the present invention likewise comprises the kits or the necessary for 
implementing a process such as described (detection of the expression of a gene of 
Legionella pneumophila Paris and/or Lens strain and/or Philadelphia, or of an 
associated microorganism, or for detection and/or identification of bacteria belonging to 
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the Legionella pneumophila species Paris and/or Lens and/or Philadelphia strain, or an 
associated microorganism), comprising the following elements: 

a) a polyclonal or monoclonal antibody according to the invention; 

b) possibly, the reagents for the constitution of the medium prone to 
immunological reaction; 

c) possibly, the reagents allowing use of the antigen-antibody complexes 
produced by the immunological reaction. 

The polypeptides and the antibodies according to the invention can 
advantageously be immobilized on a support, especially a protein chip. This type of 
protein chip is an object of the invention, and can likewise contain at least one 
polypeptide of a microorganism other than Legionella pneumophila Paris and/or Lens 
strain and/or Philadelphia strain, or an antibody directed against a compound of a 
microorganism other than Legionella pneumophila Paris and/or Lens and/or 
Philadelphia strain. 

The high-density protein or filter chips containing proteins according to the 
invention can be constructed in the same way as the DNA chips according to the 
invention. In practice, synthesis of the polypeptides fixed directly on the protein chip or 
ex situ synthesis followed by a stage for fixing the synthesized polypeptide on said chip 
can be carried out. This latter method is preferable, when proteins of significant size are 
to be fixed on the support, which are advantageously prepared by genetic engineering 
All the same, if it is preferred to fix only peptides on the support of said chip, it can be 
more interesting to proceed with synthesis of said peptides directly in situ. 

The protein chips according to the invention can be advantageously utilized in 
kits or necessary for the detection and/or identification of bacteria associated with the 
Legionella pneumophila species Paris and/or Lens strain and/or Philadelphia strain or 
with a microorganism, more generally in kits or necessary for the detection and/or 
identification of microorganisms. When the polypeptides according to the invention are 
fixed on DNA chips, the presence of antibodies is searched for in the samples tested 
with the fixation of an antibody according to the invention on the support of the protein 
chip allowing identification of the protein whereof said antibody is specific. 

Preferably, an antibody according to the invention is fixed on the support of the 
protein chip, and the presence of the corresponding antigen, specific to Legionella 
pneumophila Paris and/or Lens strain and/or Philadelphia strain or of an associated 
microorganism is detected. 
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A protein chip described hereinabove can be utilized for the detection of 
products of genes, for establishing an expression profile of said genes, as a complement 
to a DNA chip according to the invention. 

The protein chips according to the invention are likewise extremely useful for 
5 proteomic experiments, which studies the interactions between the different proteins of 
a given microorganism. In a simplified manner, peptides representative of the different 
proteins of an organism are fixed on a support. Next, said support is put in contact with 
marked proteins, and after an optional rinsing stage, interactions are detected between 
said marked proteins and the peptides fixed on the protein chip. 
1 0 Therefore, the protein chips comprising a polypeptidic sequence according to the 

invention or an antibody according to the invention are an object of the invention, as 
well as kits or necessary containing them. 

The present invention likewise covers a process for detection and/or 
identification of bacteria belonging to the Legionella pneumophila species Paris and/or 
15 Lens and/or Philadelphia strain, or to an associated microorganism in a biological 
sample, which uses a nucleotidic sequence according to the invention. 

It should be understood that the term biological sample relates in the present 
invention to the samples taken from a living organism (in particular blood, tissue, 
organs or the like taken a mammal) or a sample containing biological material, that is, 
2 0 DNA. Such a biological sample especially includes all fluids (liquid or air-borne), or 
any object, such as conduits for fluid, filters for fluids, or any object capable of being 
implied in the fluid supply in buildings, nutritional compositions containing bacteria or 
other. 

The process for detection and/or identification utilizing the nucleotidic 

2 5 sequences according to the invention can be diverse in nature. 

A process comprising the following stages is preferred; 

a) possibly, isolation of DNA from the biological sample to be analyzed, or 
obtaining DNAc from the RNA of the biological sample; 

b) specific amplification of the DNA of bacteria belonging to the 

3 0 Legionella pneumophila species Paris strain, Lens, Philadelphia or to a microorganism 

associated by means of at least one primer according to the invention; 

c) revealing amplification products. 

This process is based on specific amplification of DNA, in particular by a chain 
amplification reaction. 
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Likewise, a process comprising the following stages is preferred: 

a) contact by a nucleotidic probe according to the invention with a 
biological sample, the nucleic acid contained in the biological sample having, if 
required, previously been made accessible to hybridization, in conditions permitting 

5 hybridization of the probe to the nucleic acid of bacteria belonging to the Legionella 
pneumophila species Paris strain, or to an associated microorganism; 

b) revealing the hybrid possibly formed between the nucleotidic probe and 
the DNA of the biological sample. 

Such a process must not be limited to detection of the presence of the DNA 
10 contained in the biological sample tested, and it can likewise be used for detecting the 
RNA contained in said sample. This process in particular includes the Southern and 
Northern blot. 

Another preferred process according to the invention comprises the following 

stages: 

15 a ) contact by a nucleotidic probe immobilized on a support according to the 

invention with a biological sample, the nucleic acid of the sample, having, if required, 
been previously made accessible to hybridization, in conditions allowing hybridization 
of the probe to the nucleic acid of bacteria belonging to the Legionella pneumophila 
species Paris strain or to an associated microorganism; 
20 b ) contact by the hybrid formed between the nucleotidic probe immobilized 

on a support and the nucleic acid contained in the biological sample, if required after 
elimination of the DNA from the biological sample not having hybridized with the 
probe, with a nucleotidic probe marked according to the invention; 
c) revealing the novel hybrid formed at stage b). 
25 This process is advantageously utilized with a DNA chip according to the 

invention, the desired nucleic acid hybridizing with a probe present on the surface of 
said chip, and being detected by utilization of a marked probe. This process is 
advantageously implemented by combining a previous amplification stage of DNA or of 
complementary DNA obtained possibly by inverse transcription, by means of primers 
3 0 according to the invention. 

Therefore, the present invention likewise includes the kits or necessary for the 
detection and/or identification of bacteria belonging to the Legionella pneumophila 
species Paris and/or Lens and/or Philadelphia strain, or to an associated microorganism, 
characterized in that it comprises the following elements: 
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a) a nucleotidic probe according to the invention; 

b) possibly, the reagents necessary for using a hybridization reaction; 

c) possibly, at least one primer according to the invention as well as the 
reagents necessary for amplification reaction of the DNA. 

5 Similarly, the present invention likewise includes the kits or necessary for the 

detection and/or identification of bacteria belonging to the Legionella pneumophila 
species Paris strain or to an encircled microorganism, characterized in that it comprises 
the following elements: 

a) a nucleotidic probe, known as capture probe, according to the invention; 
10 b) an oligonucleotidic probe, known as revelation probe, according to the 

invention; 

c) possibly, at least one primer according to the invention as well as the 
reagents necessary for amplification reaction of the DNA. 

Finally, the kits or necessary for the detection and/or identification of bacteria 
1 5 belonging to the Legionella pneumophila species Paris and/or Lens and/or Philadelphia 
strain, or to an associated microorganism, characterized in that it comprises the 
following elements: 

a) at least one primer according to the invention; 

b) possibly, the reagents necessary for performing an amplification reaction 
20 of DNA; 

c) possibly, a compound enabling the sequence of the amplified fragment, 
more particularly an oligonucleotidic probe according to the invention, to be verified, 

are likewise objects of the present invention. 

Preferably, said primers and/or probes and/or polypeptides and/or antibodies 
2 5 according to the present invention utilized in the processes and/or kits or necessary 
according to the present invention are selected from amongst the primers and/or probes 
and/or polypeptides and/or antibodies specific to the Legionella pneumophila species 
Paris and/or Lens and/or Philadelphia strain. In a preferred manner these elements are 
selected from amongst the nucleotidic sequences coding for a secreted protein, among 
30 the polypeptides secreted, or among the antibodies directed against exported 
polypeptides, such as those implied in the wall or the cellular envelope of Legionella 
pneumophila Paris and/or Lens and/or Philadelphia strain. 

The object of the present invention is likewise the strains of Legionella 
pneumophila Paris or Lens strain, and/or associated microorganisms containing one or 
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more mutation(s), at most less than 10 % mutation (or again less as cited for the 
modifications of polypeptides) in a nucleotidic sequence according to the invention, in 
particular an ORF sequence, or their regulatory elements (in particular promoters). 

According to the present invention, the strains of Legionella pneumophila Paris 
or Lens strain having one or more mutation(s) in the nucleotidic sequences coding for 
polypeptides implied in the machine cellular, in particular secretion, central 
intermediary metabolism, energetic metabolism, the process of synthesizing of the 
amino acids, transcription and translation, synthesis of the polypeptides, are preferred. 

Said mutations can lead to inactivation of the gene, or in particular when they 
are situated in the regulatory elements of said gene, at overexpression of the latter. 

The invention relates further to utilizing a compound selection method capable 
of inhibiting the expression of genes implied in the biosynthesis of polysaccharides 
having a cellular envelope of bacteria of the Legionella pneumophila species Paris 
strain, characterized in that it comprises the following stages: 

a) contact by said compound with a bacteria of said Paris strain, said 
bacteria being in conditions and in medium appropriate to its culture; 

b) determination of the capacity of said compound to inhibit the expression 
of the genes coding for the proteins of SEQ ID Nos. 1126, 3218, 288, 632, 917, 1503, 
1555, 1877, 1928, 1963, 2204, 2212, 2243, 2324, 2378, 2410, 2411; 

c) by means of a process according to the invention in which said antibody 
is directed specifically against a polypeptide implied in the biosynthesis of the 
polysaccharides, or by means of a process according to the invention in which the 
probes or primers are specific to a nucleic sequence coding for a polypeptide implied in 
the biosynthesis of the polysaccharides; 

d) selection of organic or inorganic compound capable of modulating, 
regulating, inducing or inhibiting the expression of genes, and/or of modifying the 
cellular replication of eukaryotic or prokaryotic cells or capable of inducing, inhibiting 
or aggravating the pathologies associated with infection by Legionella pneumophila 
Paris strain or one of its associated microorganisms. 

The invention likewise comprises a method for selection of compounds capable 
of binding to a polypeptide or one of its fragments according to the invention, capable 
of binding to a nucleotidic sequence according to the invention, or capable of 
recognizing an inventive antibody, and/or capable of modulating, regulating, inducing 
or inhibiting the expression of genes, and/or modifying the growth or cellular 



WO 2005/049642 



40 



PCT/IB2004/003578 



10 



15 



20 



25 



30 



replication of eukaryotic or prokaryotic cells, or capable of inducing, inhibiting or 
aggravating in a animal or human organism the pathologies bound to infection by 
Legionella pneumophila Paris strain or Lens or Philadelphia strains, or one of its 
assoaated microorganisms, characterized in that it comprises the following stages: 

a) contact by said compound with said polypeptide, said nucleotidic 
sequence, with a cell transformed according to the invention and/or administration of 
said compound to an animal transformed according to the invention; 

b) determination of the capacity of said compound to be bound with said 
polypeptide or said nucleotidic sequence, or to modulate, regulate, induce or inhibit the 
expression of genes, or to modulate the growth or cellular replication, or induce, inhibit 
or aggravate in said transformed animal the pathologies bound to infection by 
Legionella pneumophila Paris strain or Lens or Philadelphia strains strain, or one of its 
associated microorganisms. 

The cells and/or the animals transformed according to the invention, could 
advantageously serve as a model and be utilized in processes for studying, identifying 
and/or selecting compounds capable of being responsible for pathologies induced or 
aggravated by Legionella pneumophila Paris strain or Lens or Philadelphia strains 
strain, or capable of preventing and/or treating these pathologies such as for example 
genital, ocular or systemic diseases, especially of the lymphatic system. In particular 
the transformed host cells, especially the bacteria of the family of Legionella* whereof 
the transformation by a vector according to the invention can for example grow or 
inhibit its infectious capacity, or modulate the pathologies habitually induced or 
aggravated by the infection, could be utilized for infecting animals in which the 
appearance of pathologies will be followed. These animals not transformed, infected for 
example with transformed Legionellae bacteria, will be able to serve as a study model 
In the same manner, the animals transformed according to the invention will be able to 
be utilized in selection processes for compounds capable of preventing and/or treating 
diseases due to Legionella. Said processes utilizing said transformed cells and/or 
transformed animals are part of the invention. 

The compounds capable of being selected can be organic compounds such as 
polypeptides or hydrates of carbon or any other already known organic or inorganic 
compounds, or novel organic compounds elaborated by molecular modeling techniques 
and obtamed by chemical or biochemical synthesis, these techniques being known to the 
specialist. 



WO 2005/049642 



PCT/IB2004/003578 



41 

Said selected compounds will be able to be utilized for modulating the growth 
and/or cellular replication of Legionella pneumophila Paris and/or Lens and/or 
Philadelphia strain, or any other associated microorganism and thus to control infection 
by these microorganisms. Said compounds according to the invention will likewise be 
5 utilized for modulating the growth and/or cellular replication of all eukaryotic or 
prokaryotic cells, especially tumoral cells and infectious microorganisms, for which 
said compounds will prove to be active, the methods determining said modulations 
being well known to the specialist. 

Compound capable of modulating the growth of a microorganism is understood 
10 to mean any compound allowing to intervene, modify, limit and/or reduce the 
development, growth, rate of proliferation and/or the viability of said microorganism. 

This modulation can be realized for example by an agent capable of binding to a 
protein and thus inhibit or potentialize its biological activity, or capable of binding to a 
membranous protein of the external surface of a microorganism and blocking 
15 penetration of said microorganism in the host cell or benefiting the action of the 
immune system of the infected organism directed against said microorganism. This 
modulation can likewise be realized by an agent capable of binding to a nucleotidic 
sequence of DNA or RNA of a microorganism and for example blocking the expression 
of a polypeptide whereof the biological or structural activity is necessary to the growth 
20 or to the reproduction of said microorganism. 

For these screening methods, likewise associated microorganism in the present 
selection method is understood to mean any microorganism whereof the gene 
expression can be modulated, regulated, induced or inhibited, or whereof the growth or 
cellular replication can be likewise modulated by a compound of the invention. 
2 5 Likewise, associated microorganism in the present invention is understood to mean any 
microorganism comprising nucleotidic sequences or polypeptides according to the 
invention. These microorganisms can in certain cases comprise polypeptides, or 
nucleotidic sequences identical or homologous to those of the invention will likewise be 
able to be detected and/or identified by the processes or detection and/or identification 
30 kit according to the invention and likewise serve as target for the compounds of the 
invention. 

The invention relates to the compounds capable of being selected by a selection 
method according to the invention. 
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The invention likewise relates to a pharmaceutical composition comprising a 
compound selected from amongst the following compounds: 

a) a nucleotidic sequence according to the invention; 

b) a polypeptide according to the invention; 
5 c) a vector according to the invention; 

d) an antibody according to the invention; and 

e) a compound capable of being selected by a selection method according 
to the invention, possibly in association with a pharmaceutically acceptable vehicle. 

Efficacious quantity is understood to mean an adequate quantity of said 

1 0 compound or antibodies, or of polypeptide of the invention, allowing to modulate the 
growth of Legionella pneumophila Paris and/or Lens and/or Philadelphia strain, or of an 
associated microorganism. 

The invention also relates to a pharmaceutical composition according to the 
invention for the prevention or treatment of an infection by a bacteria belonging to the 

1 5 Legionella pneumophila species Paris strain or Lens or Philadelphia strains, or by an 
associated microorganism. 

The further aim of the invention is an immunogenic and/or vaccinal 
composition, characterized in that it comprises one or more polypeptides according to 
the invention and/or one or more hybrid polypeptides according to the invention. 

2 0 The invention also comprises utilization of a cell transformed according to the 

invention, for the preparation of a vaccinal composition. 

The aim of the invention likewise is a vaccinal composition, characterized in 
that it contains a nucleotidic sequence according to the invention, a vector according to 
the invention and/or a cell transformed according to the invention. 

2 5 The invention likewise relates to vaccinal compositions according to the 

invention, for the prevention or treatment of an infection by a bacteria belonging to the 
Legionella pneumophila species Paris strain or Lens or Philadelphia strains, or by an 
associated microorganism. 

In a preferred manner the immunogenic and/or vaccinal compositions according 

30 to the invention for preventing and/or treating infection by Legionella pneumophila 
Paris strain or Lens or Philadelphia strains, or by an associated microorganism will be 
selected from among the immunogenic and/or vaccinal compositions comprising a 
polypeptide or one of its fragments corresponding to a protein, or one of its fragments, 
of the cellular envelope of Legionella pneumophila Paris strain or Lens or Philadelphia 
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strains. The vaccinal compositions comprising nucleotidic sequences will preferably 
likewise comprise nucleotidic sequences coding for a polypeptide or one of its 
fragments corresponding to a protein, or one of its fragments, of the cellular envelope of 
Legionella pneumophila Paris strain or Lens or Philadelphia strains. 
5 Of these preferred immunogenic and/or vaccinal compositions, the most 

preferred are those comprising a polypeptide or one of its fragments, or a nucleotidic 
sequence or one of its fragments whereof the sequences are selected from among the 
nucleotidic sequences or amino acids identified in this functional group and listed 
previously. 

1 0 The polypeptides of the invention or their fragments entering the immunogenic 

compositions according to the invention can be selected by techniques known to the 
specialist such as for example on the capacity of said polypeptides to stimulate the T 
cells, which translates for example by their proliferation or the secretion of interleukins, 
and which terminates with the production of antibodies directed against said 

1 5 polypeptides. 

In mice, in which a ponderal dose of the vaccinal composition comparable to the 
dose utilized in humans is administered, the reaction antibody is tested by taking serum 
followed by studying the formation of a complex between the antibodies present in the 
serum and the antigen of the vaccinal composition, according to customary techniques. 

2 0 According to the present invention, said vaccinal compositions will preferably 

be in association with a pharmaceutically acceptable vehicle and, if required, with one 
or more adjuvants of appropriate immunity. 

These days, diverse types of vaccines are available for protecting humans 
against infectious diseases: attenuated living microorganisms (M. bovis - BCG for 

25 tuberculosis), inactive microorganisms (flu virus), acellular extracts (Bordetella 
pertussis for pertussis), recombinant proteins (surface antigen of the hepatitis B virus), 
polyosides (pneumococci). Vaccines prepared from synthesis peptides or genetically 
modified microorganisms expressing heterologous antigens are in experimentation. Still 
more recently, recombined plasmidic DNAs carrying coding for protector antigens were 

30 proposed as an alternative vaccinal strategy. This type of vaccination is realized with a 
particular plasmid deriving from a plasmid of E. coli which does not replicate in vivo 
and which codes solely for the vaccinating protein. Animals were immunized by simply 
injecting naked plasmidic DNA into the muscle. This technique results in expression of 
the vaccinal protein in situ and to an immune response of cellular type (CTL) and of 
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humoral type (antibodies). This double induction of the immune response is one of the 
principal advantages of the vaccination technique with naked DNA. 

The vaccinal compositions comprising nucleotide sequences or vectors in which 
said sequences are inserted are described in particular in international application 
N° WO 90/1 1092 and likewise in international application N° WO 95/1 1307. 

The nucleotidic sequence making up the vaccinal composition according to the 
invention can be injected into the host after having been coupled to compounds which 
benefit penetration of this polynucleotide inside the cell or its transport as far as the 
cellular nucleus. The resulting conjugates can be encapsulated in polymer 
microparticles, as described in international application N° WO 94/27238 (Medisorb 
Technologies International). 

According to another embodiment of the vaccinal composition according to the 
invention, the nucleotidic sequence, preferably a DNA, is complexed with DEAE- 
dextran, with nuclear proteins, with lipids or encapsulated in liposomes or introduced in 
the form of a gel facilitating its transfection in the cells. The polynucleotide or the 
vector according to the invention can also be in suspension in a buffer solution or be 
associated with liposomes. 

Advantageously, such a vaccine will be prepared according to the technique 
described by Tacson et al. or Huygen et al. in 1996 or again according to the technique 
described by Davis et al. in the international application N° WO 95/1 1307. 

Such a vaccine can likewise be prepared in the form of a composition containing 
a vector according to the invention, placed under the control of regulation elements 
allowing its expression in humans or animals. For example, the polypeptide antigen of 
interest, the pcDNA3 plasmid or the pcDNAl/neo plasmid could be utilized as an in 
vivo expression vector, both marketed by Invitrogen (R&D Systems, Abingdon, UK). 
Such a vaccine will comprise advantageously, apart from the recombinant vector, a 
saline solution, for example a sodium chloride solution. 

Pharmaceutically acceptable vehicle is understood to mean a compound or a 
combination of compounds entering a pharmaceutical or vaccinal composition not 
30 causing secondary reactions and which enables for example ease of administration of 
active compound, an increase in its life expectancy and/or its efficacy in the organism, 
augmentation of its solubility in solution or again an improvement in its preservation.' 
These pharmaceutically acceptable vehicles are well known and will be adapted by the 
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specialist as a function of the nature and mode of administration of the selected active 
compound. 

As for vaccinal formulations, these can comprise adjuvants of appropriate 
immunity which are known to the specialist, such as for example aluminum hydroxide, 
a representative of the family of muramyl peptides as one of the peptidic derivatives of 
N-acetyl-muramyl, a bacterial lysate, or even the incomplete Freund adjuvant. 

Preferably, these compounds will be administered systemically, in particular 
intravenously, intramuscularly, intradermally or subcutaneously, or orally. In a more 
preferred way, the vaccinal composition comprising polypeptides according to the 
invention will be administered in several doses, spread out over time, intradermically or 
subcutaneously. 

Their administration methods, posologies and optimal galenic forms can be 
determined according to the criteria generally taken into account in setting up treatment 
adapted to a patient such as for example age or body weight of the patient, the 
seriousness of the general status, tolerance to treatment and the secondary effects. 

The invention comprises utilizing an inventive composition, for the treatment or 
prevention of diseases brought on or aggravated by the presence of Legionella 
pneumophila Paris strain or Lens or Philadelphia strains. 

The invention comprises the utilization of a composition according to the 
2 0 invention for the treatment or prevention of systemic diseases, induced or aggravated by 
the presence of Legionella pneumophila Paris strain or Lens or Philadelphia strains. 

Additionally, an object of the present invention likewise is a genomic DNA bank 
of a bacteria of the species Legionella pneumophila Paris strain, characterized in that 
this is the bank deposited with the CNCM on November 19, 2003, under the order 
25 number 1-3138. 

Additionally, an object of the present invention likewise is a genomic DNA bank 
of a bacteria of the species Legionella pneumophila Lens strain, characterized in that 
this is the bank deposited with the CNCM on September 23, 2004, under the order 
number 1-3306. 

3 0 Additionally, an object of the present invention likewise is a vector or a host cell 

as claimed in Claim 38 or 42, characterized in that this is the vector or the cell deposited 
with the CNCM on November 19, 2003, under the order number 1-3137. 

One of the advantages of using the BAC system relative to a cosmides system is 
that the plasmid utilized is present only in a maximum two copies per transformed cell, 
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which reduces the potential for recombination between DNA fragments, and more 
importantly, which eliminates the risk of lethal overexpression of bacterial cloned 
genes. Nevertheless, the presence of BAC as a single copy signifies that the plasmidic 
DNA must be extracted from a large volume of culture in order to obtain enough DNA 
5 for the sequence. In addition, the stability and fidelity of maintenance of the clones in a 
BAC bank enable identification of genomic differences among different strains of 
Legionella, and identification of these genetic differences which can be responsible for 
the phenotypical variations observed between the different strains. 

The genomic DNA banks described in the present invention effectively cover 

10 the genome of Legionella pneumophila Paris and Lens strains. All the same, although it 
is possible that certain regions have not been able to be cloned in said bank, by virtue of 
lethality problems in Escherichia coli, these regions can easily be amplified and 
identified by the specialist, by utilizing oligonucleotides specific to the sequences of the 
ends of the different clones which form the contigs. 

15 Additionally, an object of the present invention likewise is a method for 

isolating a polynucleotide of interest present in a bacteria of the Legionella genre and 
absent from a bacteria of another genre, or present in a pathogenic bacteria of the 
Legionella genre and absent from a non-pathogenic bacteria of the Legionella genre, or 
again present in a bacteria of the Legionella pneumophila species and absent from a 

2 0 bacteria of any other species of the Legionella genre, or again present in a bacteria of 
the Legionella pneumophila species Paris and/or Lens and/or Philadelphia strain and 
absent from a bacteria of the Legionella pneumophila species of any other strain, 
characterized in that it utilizes at least the BAC bank deposited on November 19, 2003 
(1-3138) with the C.N.C.M and the BAC bank deposited on September 23, 2004 (I- 

25 3306) with the CNCM according to the invention. 

Said method is preferably characterized in that it comprises the following stages: 

a) isolating at least one polynucleotide contained in a clone of said DNA 
bank deposited with the CNCM on November 19 2003, under the order number 1-3138 
or contained in a clone of said BAC bank deposited on September 23 2004 under the 

30 number 1-3306; 

b) isolating: 

at least one genomic polynucleotide or DNAc of a second bacteria of 
another genre or of the Legionella genre, said second bacteria of the Legionella genre 
belonging to a different strain of the Paris strain or, alternatively, 
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at least one polynucleotide contained in a clone of a DNA bank based on 
a BAC prepared from the genome of a second bacteria of another genre or of the 
Legionella genre, said second bacteria of the Legionella genre belonging to a different 
strain of the Paris and/or Lens and/or Philadelphia strain; 
5 c) hybridizing the polynucleotide of stage a) to the polynucleotide of stage 

b); 

d) selecting the polynucleotides of stage a) which do not have the 
hybridization complex form with the polynucleotides of stage b); and 

e) characterizing the selected polynucleotide. 

1 0 The polynucleotide of stage a) can be prepared by the digestion of at least one 

recombinant BAC clone with an appropriate restriction enzyme, and optionally, 
amplification of the resulting polynucleotide insert. 

Therefore, the method of the invention enables the specialist to effect 
comparative genomic studies between the different strains or species of the Legionella 
1 5 genre, for example between the pathogenic strains and their non-pathogenic equivalent. 

In particular, it is possible to study and determine the regions of polymorphism 
between said strains. 

LEGENDS OF THE FIGURES 

2 0 Figure 1 : Circular genomic map of the line L. pneumophila Paris and specific 

genes of the L. pneumophila Lens line. From the exterior: circle 1 : genes of Paris line 
on the chains + and - respectively. Red line, inversion in line Lens. Color code: green, 
genes of Paris line, black, rRNA operons, red, known virulence genes; the numbers 
indicate their position: 1 lvh-lvr secretion system type IV (lvrABC, lvhB2B3B4B5, 

2 5 lvrD, lvhB6B8B9B10BHD4, lvrE); 2 dot/icm secretion system type IV 
(icmTSRQOMLKEGCDJBF); 3 mip, 4 IspA, 5 IspDE, 6 htrA, 7 IspFGHIJK, 8 
enhABC, 9 dot/icm secretion system type IV (icmVWX and dotABCD), 10 momp; 
circle 2: specific genes of the Lens line relative to the Paris line; 3: bias G/C (G+C/G-C) 
of the Paris line; circle 4: G+C content of the Paris line with <32,5% G+C in light 

30 yellow, between 32.5% and 44.1% in yellow and with >44.1% G+C in dark yellow. The 
scale (Mb) is indicated on the outside, the origin of the replication in position 0. 

Figure 2: Phylogenetic tree of a multiple sequential comparison of kinase 
domains from Legionella pneumophila of Paris line to other prokaryotic and eukaryotic 
kinases by utilizing the MEGA program. The calculation was made by utilizing the 
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Poisson correction as a distance process and as a tree construction process the 
Neighbor-joining distance method. 0.2 indicates the 2 amino acid substitutions for 10 
sites. The Nrprot access numbers, the names of genes and the names of organisms are 
indicated on the pattern. The numbers indicate the priming values. 
5 Figure 3: Schema representative the genome core and the sole gene complement 

of L. pneumophila Paris, Lens and Philadelphia lines. The orthologous genes were 
defined by the most adequate reciprocated FASTA comparisons. The threshold was 
defined at a maximum of 80 % of sequence identities and at a length ratio of 0.75 to 
1.33. The coding sequences of Philadelphia were determined by the Genmark 

1 0 predictions utilizing the « CAAT-box » program and the sequence obtained on the site 
http://www.genome3.cpmc.columbia.edu/-legion/proiect/ (latest version). 

Figure 4: A. Comparison of the protein-coding RTX genes of L. pneumophila 
AA100, Paris and Lens lines. The sequence of the rtxA locus of line AA100 was 
obtained from the NCBI database (AAD41583). The dotted lines indicate that the 

15 correct number of repetitions is uncertain. B. Consensus sequences of the highly 
preserved repeated patterns of Paris and Lens lines. The amino-acid sequences in black 
indicate 1 1 amino-acids of the preserved N-terminal sequence of Paris and Lens lines, 
the amino-acids sequences in color represent the repeated patterns of each line (same 
color as for A). The underlined amino-acids indicate the positions which can change 

2 0 among the repetitions. 

Figure 5: Pattern illustration of the different stages of intracellular growth of L. 
pneumophila in the macrophages. The different phases are numbered 1) Adhesion and 
invasion of L. pneumophila in the host cell 2) The phagosome does not fuse with the 
lysosomes but recruits organelles and converts to a compartment of rugged endoplasmic 

2 5 reticulum type. 3) Intracellular replication, non-flagellated L. pneumophila inside a 
phagosome. 4) Release of L. pneumophila. Flagellated. In red: Different important 
stages in the infectious cycle of L. pneumophila. In blue: Hypothesis indicating the 
stages at which the identified proteins could interfere in this cycle. 

Figures 6 and 7: Southern Blot showing the specificity of the repeated sequence 

30 SEQ ID N° 7074 in L. pneumophila; the legend of Figure 6 given by Table XXV and 
that of Figure 7 by Table XXVI. 

EXAMPLES 

Example 1 : Materials and methods 
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L Construction of hanks 

Shotgun ba nk of small fragments (size 1.5 to ? 5 kh) 

The chromosomal DNA of the strains studied was prepared by a classic method 
including proteinase K treatment and phenol extraction (9). Approximately 36 ug of 
DNA were broken by nebulization (1 minute under pressure of 1 bar) (4). The ends of 
the DNA fragments were rendered free by having the DNA-polymerase of the 
bacteriophage T4 act for 15 minutes at 37°C in the presence of 4 tri-phosphate 
nucleotides. The enzyme was inactivated by incubation of 15 mn at 75°C. Adaptors 
(invitrogen Cat. N° 408-18) were ligatured to these ends. After ligature, the fragments 
of chromosomal DNA of a size between 1500 and 2500 base pairs were purified after 
electrophoresis on agarose gel. The vector utilized for construction of the bank, 
pcDNA2.1 (Invitrogen), was digested by the BstXl enzyme and purified by geneclean 
(BIO-101) after electrophoresis on agarose gel. The chromosomal DNA and the purified 
vector were ligatured by action of the ligase of the bacteriophage T4. The ligation 
15 mixture was introduced by transformation to the strain of Escherichia coli XL2-blue 
(Stratagene). Environ 4000 colonies are obtained per ul of the ligation mixture. 
Bank of aver age fragments (size 5 to 10 kb) 

The bank was constructed by the technique of 'partial fill in' in the vector 
pSYX34 (12). The chromosomal DNA of the strain L. pneumophila Paris was prepared 
20 by partial digestion by SaulllA (Roche). After precipitation of the DNA in sodium 
acetate and the stage of partial fill-in with the A and G nucleotides by utilizing the 
Klenow enzyme, the fragments of chromosomal DNA having a size of between 5000 
and 10000 base pairs were purified after electrophoresis on agarose gel and geneclean. 

The vector is prepared in the same way by partial digestion with the Sail 
2 5 enzyme, precipitation in sodium acetate then reaction of partial fill-in with the C and T 
nucleotides and purification on agarose gel and geneclean. The fragments of 
chromosomal DNA and the purified vector were ligatured by action of the ligase of the 
bacteriophage T4. The ligation mixture was introduced by transformation to the strain 
of Escherichia coli XLIO-Gold (Stratagene). Around 4000 colonies are obtained per ul 
30 of the ligation mixture. The two ends of around 4000 fragments of this bank were 
sequenced. 

Bank of large fragments (size 25 to 90 kh) 

The bank of large fragments was constructed as described previously (4) by 
utilizing the plndigo BAC vector (Epicentre). Briefly, in order to avoid mechanical 
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breaking of the DNA molecules the cells were included in agarose blocks in which 
DNA extraction is performed directly. For the preparation of large-sized fragments we 
performed partial digestion by Hindlll (Roche) and separation by electrophoresis in 
pulsed fields. Fragments of sizes between 40 and 80 and between 80 and 130 kb were 
5 excised from the gel, purified by agarase treatment and ligature with the vector. The 
ligation mixture was introduced by electroporation to the strain of Escherichia coli 
DH10B (Gibco BRL). 1300 colonies were stored. The plasmidic DNAs of these 1300 
colonies were extracted and the two ends of the cloned fragments were sequenced. 
2. Preparation of plasmids and sequencing. 

1 0 The plasmids were prepared by a semi-automatic preparation method developed 

at the GMP laboratory and based on the alkaline lysis method (2). The chromosomal 
inserts were sequenced from their two ends by utilizing the T7 and universal primer in 
following the recommendations of the supplier (Applied-Biosystems). The sequences 
were determined by utilizing automatic sequencers of type 3700 (Applied-Biosystem). 

15 3. Assembling of sequences. 

The sequences were assembled by utilizing the software suite developed at the 
University of Washington, Phred, Phrap and Consed (5, 8). The sequence completion 
was done by utilizing the software suite CAAT-box (7). The finishing stage corresponds 
to resequencing of the regions where the sequence is only slightly secure and 

2 0 sequencing of the regions located between the contigs. It was carried out either by 
sequencing PCR products or by operating on the bank clones. The sequences of 
oligonucleotides were defined by utilizing the consed and Primo software (8, 10). 
4. Annotation of sequences. 

The identification of the phases coding (CDS) was done by utilizing the 

2 5 software suite CAAT-box (7). This program combines the results of different methods: 

- (i) identification of open reading phases and their tri as a function of their size; 

- (ii) analysis of the probability of being coding by utilizing the Genemark 
software (11); 

- (iii) identification of translation start (initiation codon and fixing sequence of the 
30 ribosome); and 

- (iv) the % of identity of the proteic sequence deduced with the proteic sequences 
contained in sequence banks by utilizing BLASTP software. 
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The functions of the proteins coded by the identified coding phases were 
predicted by analysis of the search results of similarities in the non-redundant NCBI 
bank (http://www.ncbi.nlm.nih.gov/BLAST/) by utilizing BLASTP software (1). 
5. Comparison of the genomes - identificatio n of the CDS specific to the strain of L 
5 pneumophila Paris strain. 

All the proteic sequences deduced from the predicted coding phases of each 
genome was compared to all the proteic sequences possibly coded by the other genome 
by using BLASTP software. A threshold of 75 % of identity on the totality of the length 
of the protein was retained to identify the specific proteins of an isolate. This very high 
1 0 value was retained since it best allows discrimination of the orthologous genes from the 
paralogous genes (6). For the proteic sequences for which the sequence preservation is 
high (> at 70 %) the preservation of the nucleotidic sequences of the genes will also be 
high and could give a signal in low-stringency hybridization conditions. It will be 
necessary to consider this eventuality in the analysis of the test result. 
15 Example 2: Deposit of b iological material 

The following organisms were deposited on November 19, 2003 with the 
Collection Nationale de Cultures de Microorganisms (CNCM) [National Collection of 
Microorganism Cultures], 25 rue du Docteur Roux, 75724 Paris Cedex 15, France, 
according to the dispositions of the Budapest Treaty: 

Clone of a shotgun bank, clone in the pCDNA vector, of the genome of 
Legionella pneumophila Paris strain (Pasteur Institute Collection CIP 107-629-T), 
registered under the file number 1-3137. The insert of this clone is at a size of 14.2 kb 
and contains a gene coding for an autotransporter called led0019A07; 

BAC DNA bank (1248 clones) of the genome of Legionella pneumophila Paris 
strain (Pasteur Institute Collection CIP 107-629-T), registered under file number 1-3138. 
Said bank BAC (1-3138) was made in the E.coli DH10B strain (Grant et al, PNAS, 
87:4645, 1990). The inserts of this bank were cloned in the pBelo BAC-Kan vector 
(Mozo et al, Mol. Gen. Genet., 1998, 258:562-70) and have an average size of between 
1.5 and 2.5 kb. The total of these inserts corresponds to complete coverage of the 
3 0 genome. 

Example 3: Annotations of sequences 

1. Genes specific to L. pneumophila Paris strain relative to the L. pneumophila 
Philadelphia strain 



2 0 



25 
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No significant identity between the nucleotidic sequence of the gene of L. 
pneumophila Paris strain and the genome of L. pneumophila Philadelphia strain. 



Table VII : Example of annotation of sequences in the case of proteic and nucleic 
5 sequence of L. pneumophila Paris strain not having % of significant identity with 
respectively proteic and nucleic sequences of L. pneumophila Philadelphia strain 



IPF No. of the gene of 
L. pneumophila 
Paris strain 


IPF No. of the gene of 
Z. pneumophila 
Philadelphia strain 
(best score) 


% of identity 
of proteic 
sequences 


% of identity 
of nucleotidic 
sequences 


2043.1 








2094.2 








2039.1 








2051.2 


3061.1 


33% 


not significant 


3425.1 


5305.1 


32% 


not significant 



2. Genes common to the two strains L. pneumophila Paris strain and Philadelphia strain 
1 0 for which the % of identity of deduced nucleic and proteic sequences is less than 75 % 

Table VIII : Example of annotation of sequences in the case of proteic and nucleic 
sequences of genes common to the two strains of L. pneumophila Paris strain and 
Philadelphia strain for which the % of identity of the deduced nucleic and proteic 
1 5 sequences is less than 75 % 



IPF No. of the gene of 
L. pneumophila Paris 
strain 


IPF No. of the gene of 
L. pneumophila 
Philadelphia strain 
(best score) 


% of identity 
of proteic 
sequences 


% of identity 
of nucleotidic 
sequences 


2244.2 


3793.1 


63 % 


59% 


258.2 


1342.1 


60% 


59% 



3. Genes common to L. pneumophila Paris strain and Philadelphia strain for which the 
% of identity of the deduced nucleic and proteic sequences is greater than 75 % 
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Table IX : Example of annotation of sequences in the case of proteic and nucleic 
sequences of genes common to the two strains of L. pneumophila Paris strain and 
Philadelphia strain for which the % of identity of the deduced nucleic and proteic 
sequences is greater than 75 % 

5 



IPF No. of 
L. pneumophila 
Paris strain gene 


IPF No. of 
L. pneumophila 
Philadelphia strain gene 
(best score) 


% of identity of 
proteic sequences 


% of identity 
of nucleotidic 
sequences 


4629.2 


133.1 


100 % 


100% 


6079.1 


4147.1 


90% 


88% 



Example 4: Example of alignment of sequences 

Presented hereinbelow are the alignments of sequences preserved in the Paris 
and Philadelphia strains. For each of the six examples which follow, we present an 

1 0 alignment of the nucleotidic sequences as well as alignment of the sequences of amino 
acids. The alignment of the sequences of amino acids is obtained by aligning the 
translated sequence of the ORF present in the Paris strain with the sequence originating 
from translation in the six phases of the contigs of the Philadelphia sequence. The 
sequence homology of these ORFs present in the two strains is very strong, as much in 

1 5 amino acids as in nucleotides. 
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TBLASTN 2.2.6 [Apr-0 9-2003 ] 

Query= 1764.3 CONTIG=Contig42 P0SCDS1=13736 POSCDS2=14869 SENS=p, Seq Id : 555 
(216 letters) 

Database: contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 

Searching . done 

Score E 

Sequences producing significant alignments: (bits) Value 

LpPhiladelphia_Contig4 9 433 e-123 

>LpPhiladelphia_Cont ig4 9 
Length = 376826 



Score 


= 433 


Identities = 


Frame 


= -2 


Query: 


1 


Sbj ct : 


297565 


Query: 


61 


Sbjct : 


297385 


Query: 


121 


Sbjct : 


297205 


Query: 


181 


Sbjct: 


297025 



HLALFDETYIKTILILLSICVLKKFILRYDMILNDIVLYDNFFMTFDYKDNFMSKGPYQS 60 
HLALFDETYIKTILILLSICVLKKFILRYDMILNDIVLYDNFFMTFDYKDNFMSKGPYQS 



FANRLISALKDRGYTASRSPNGICIKTLAEFTGASEQICRRYIRGDALPDYEKVKQLAFH 120 
FANRLISALKDRGYTASRSPNGICIKTLAEFTGASEQICRRYIRGDALPDYEKVKQLAFH 



LQVNPGWLLFGEDENATTKKNEVDEKLLHYILKQSHHLYPISQGSNDDYADFVLGLIKEV 180 
LQVNPGWLLFGEDENATTKKNEVDEKLLHYILKQSHHLYPISQGSNDDYADFVLGLIKEV 



KAI DT SENNLLKI I DLAIGSI SSYEEKRKKHSHAV 215 
KAI DTSENNLLKI I DLAIGS I S S YEEKRKKHSHAV 



Database: contigsLpPhiladelphia 

Posted date: Nov 20, 2003 10:38 AM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 

Lambda K H 

0.321 0.139 0.402 

Gapped 

Lambda K H 

0.267 0.0410 0.140 

Matrix: BLOSUM62 

Query- 1764.3 CONTIG=Contig42 P0SCDS1=13736 POSCDS2-148 69 SENS=p 
(1134 letters) 

Database : /home/Gmp/ rusniok/projets/legionella/pourBrevet- 
191103 /contigsLpPhiladelphia. 

51 sequences; 3,410,887 total letters 

Searching done 



Sequences producing significant alignments: 



Score E 
(bits) Value 
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LpPhiladelphia__Contig35 

>LpPhiladelphia__Contig35 „ 
Length = 4 8 622 

Score = 2248 bits (1134), Expect 
Identities = 1134/1134 (100%) 
Strand = Plus / Plus 



2248 0.0 



= 0.0 



Query: 1 atgatcagaaaaataatttatgttacaggtactcgtgccgattatggactgatgagagaa 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I [ I I I I I I I I I I I I I I I I I 
Sbjct: 7091 atgatcagaaaaataatttatgttacaggtactcgtgccgattatggactgatgagagaa 7150 

Query: 61 gtactaaaaagattacaccagtcagaagacattgacttatcgatttgtgtcactggtatg 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 7151 gtactaaaaagattacaccagtcagaagacattgacttatcgatttgtgtcactggtatg 7210 

Query: 121 catcttgatgctttgtatggaaatacagttaacgaaattaaagcagatcagttctcaata 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 7211 catcttgatgctttgtatggaaatacagttaacgaaattaaagcagatcagttctcaata 7270 

Query: 181 tgcggcattattcctgttgatcttgccaatgctcagcatagttctatggcaaaagctatc 240 

i I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I I II I I I I I I I I I I 
Sbjct: 7271 tgcggcattattcctgttgatcttgccaatgctcagcatagttctatggcaaaagctatc 7330 

Query: 241 ggccatgaacttttgggattcaccgaggtattcgaaagtgaaactcctgatgtcgtttta 300 

I I I I I I I I I I I I I I I I I II I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I 
Sbjct: 7331 ggccatgaacttttgggattcaccgaggtattcgaaagtgaaactcctgatgtcgtttta 7390 

Query: 301 ttgctgggagatcgaggagaaatgcttgctgcggccatagcagcgatacatttaaatatc 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I 
Sbjct: 7391 ttgctgggagatcgaggagaaatgcttgctgcggccatagcagcgatacatttaaatatc 7450 

Query: 361 ccggttgtacatctgcacggaggagagcgctctggaaccgttgatgaaatggtaaggcat 420 

I I I I I I I I I I I I I I I I I I I I I 1 I I I I II I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I 
Sbjct: 7451 ccggttgtacatctgcacggaggagagcgctctggaaccgttgatgaaatggtaaggcat 7510 

Query: 421 gcgatttccaaattatctcattatcattttgtcgcaacagaggcatccaaacaacgattg 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 7511 gcgatttccaaattatctcattatcattttgtcgcaacagaggcatccaaacaacgattg 7570 

Query: 481 attagaatgggtgagaaagaagaaaccatttttcaggttggtgctccaggcttggatgaa 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 7571 attagaatgggtgagaaagaagaaaccatttttcaggttggtgctccaggcttggatgaa 7630 

Query: 541 atcatgcagtataaaacgtctacacgtgatgtgtttaatcaacgttatggatttgatcct 600 

I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I I I I I 
Sbjct: 7 631 atcatgcagtataaaacgtctacacgtgatgtgtttaatcaacgttatggatttgatcct 7 690 



Query: 601 gacaaaaaaatctgtttattaatctatcacccggttgttcaagaagttgactcgattaaa 660 

I I I i I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I 
Sbjct: 7691 gacaaaaaaatctgtttattaatctatcacccggttgttcaagaagttgactcgattaaa 7750 
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Query: 661 attcaatttcaaagcgtgattcaggcagcactcgctacaaatttacagattatttgcctt 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | I I I I I 
Sbjct: 7751 attcaatttcaaagcgtgattcaggcagcactcgctacaaatttacagattatttgcctt 7810 

Query: 721 gagcctaattccgatacgggtggtcatttaattcgagaagtgattcaggaatatattgat 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I j I I I I I I I I I I I I I I 
Sbjct: 7811 gagcctaattccgatacgggtggtcatttaattcgagaagtgattcaggaatatattgat 7870 

Query: 781 catcctgatgttagaattatcaagcacttacatcgtccggaatttattgattgtcttgca 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 7871 catcctgatgttagaattatcaagcacttacatcgtccggaatttattgattgtcttgca 7930 

Query: 841 aattctgatgtgatgctgggaaattccagtagtggcatcatagaggcagcctcatttaac 900 

I I I I I I I I I! I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 1 t I I I I ' 
Sbjct: 7931 aattctgatgtgatgctgggaaattccagtagtggcatcatagaggcagcctcatttaac 7990 

Query: 901 ctgaacgtagttaatgttggaagcaggcaaaatttaagagaacgaagcgacaatgtcatt 960 

I I I I I I ! I I I I I I I If I I I I I II I I I 1 I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 7991 ctgaacgtagttaatgttggaagcaggcaaaatttaagagaacgaagcgacaatgtcatt 8050 

Query: 961 gatgttgatgttacttatgatgctattttgactggtctaagagaagcgctaaataaaccc 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Sbjct: 8051 gatgttgatgttacttatgatgctattttgactggtctaagagaagcgctaaataaaccc 8110 

Query: 1021 aagataaaatactctaactgttatggggatggaaaaacgagtgaaaggtgttatcaattg 1080 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 8111 aagataaaatactctaactgttatggggatggaaaaacgagtgaaaggtgttatcaattg 8170 

Query: 1081 ttaaaaactatccctttgcactcacaaatattgaataaatgcaatgcatactaa 1134 

I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 8171 ttaaaaactatccctttgcactcacaaatattgaataaatgcaatgcatactaa 8224 



TBLASTN 2.2.6 [Apr-0 9-2003 ] 



Query= 1864.3 CONTIG=Contig4 2 POSCDS1=77740 POSCDS2=7 9155 SENS=p, Seq 
Id : 622 

(489 letters) 

Database: contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 

Searching . done 



Sequences producing significant alignments: 

LpPhiladelphia_Contig4 9 

>LpPhiladelphia_Contig4 9 
Length = 37 6826 

Score = 1003 bits (2594), Expect =0.0 

Identities = 488/488 (100%), Positives = 488/488 (100%) 



Score E 
(bits) Value 

1003 



0.0 
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Frame 


= +2 


Que r y : 


1 


Sb j ct : 


21029 


Query : 


61 


Sb j ct : 


21209 


Query i 


121 


Sb j ct : 


21389 


Query : 


181 


Sb j ct : 


21569 


Query ; 


241 


Sb j ct : 


2174 9 


Query : 


301 


Sb j ct : 


21929 


Query: 


361 


Sbjct : 


22109 


Query: 


421 


Sbjct : 


22289 


Query: 


481 


Sbjct : 


22469 



KLSLPLIRLWQLSRSKHMFKPQGLYDYICQQWQEEILPSLCDYIKIPNKSPHFDAKWEEH 60 
KLSLPLIRLWQLSRSKHMFKPQGLYDYICQQWQEEILPSLCDYIKIPNKSPHFDAKWEEH 



GYMEQAVNHIANWCKSHAPKGMTLEIVRLKNRTPLLFMEIPGQIDDTVLLYGHLDKQPEM 



SGWSDDLHPWKPVLKNGLLYGRGGADDGYSAYASLTAIRALEQQGLPYPRCILIIEACEE 



SGSYDLPFYIELLKERIGKPSLVICLDSGAGNYEQLWMTTSLRGNLVGKLTVELINEGVH 



SGSASGIVADSFRVARQLISRIEDENTGEIKLPQLYCDIPDERIKQAKQCAEILGEQVYS 



EFPWIDSAKPVIQDKQQLILNRTWRPALTVTGADGFPAIADAGNVMRPVTSLKLSMRLPP 



LVDPEAASVAMEKALTQNPPYNAKVDFKIQNGGSKGWNAPLLSDWLAKAASEASMTYYDK 



PAAYMGEGGTIPFMSMLGEQFPKAQFMITGVLGPHSNAHGPNEFLHLDMVKKLTSCVSYV 



LYSFSQKK 



Database: contigsLpPhiladelphia 

Posted date: Nov 20, 2003 10:38 AM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 

Lambda K H 

0.318 0.136 0.419 

Gapped 

Lambda K H 

0.267 0.0410 0.140 

Matrix: BLOSUM62 

Query= 1864.3 CONTIG=Contig42 POSCDS1=77740 POSCDS2=7 9155 SENS=p 
(1416 letters) 

Database : /hbme/Gmp/rusniok/pro j ets/legionella/pourBrevet- 
19 1103 /contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 

Searching : done 



Score E 

Sequences producing significant alignments: (bits) Value 
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LpPhiladelphia_Contig4 9 2807 0.0 

>LpPhiladelphia_Contig4 9 
Length = 376826 

Score = 2807 bits (1416), Expect =0.0 
Identities = 1416/1416 (100%) 
Strand = Plus / Plus 

Query: 1 atgttcaaaccccaaggattgtatgattacatatgccaacagtggcaagaagagatattg 60 

I I M I I I I ! I I I I II I I I II I II I I II I 1 I I I I I 1 I I I I I I I I ! I ! ! I I I II I I I I I I I I 
Sbjct: 21080 atgttcaaaccccaaggattgtatgattacatatgccaacagtggcaagaagagatattg 21139 

Query: 61 ccaagtttatgtgactacataaaaatccctaataaatctcctcactttgatgcaaaatgg 120 

I | | | | I II I I II I ) I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I M M M I I I 
Sbjct: 21140 ccaagtttatgtgactacataaaaatccctaataaatctcctcactttgatgcaaaatgg 21199 

Query: 121 gaagaacatggttatatggagcaggcagttaatcacattgccaattggtgtaagtcgcat 180 

I ! M I I I I II II N II I II II II II I II II II I II I I M II II I II I I f II I I II II I II 
Sbjct: 21200 gaagaacatggttatatggagcaggcagttaatcacattgccaattggtgtaagtcgcat 21259 

Query: 181 gctcccaaaggaatgactctggaaattgttcgcctgaaaaataggactccattactattt 240 

I I M II II I I i I I I M II I II II M I I I I I II I I I i I II I I I I I I I I I I M II I I I II I \ 
Sbjct: 21260 gctcccaaaggaatgactctggaaattgttcgcctgaaaaataggactccattactattt 21319 

Query: 241 atggaaattccaggccaaattgatgacactgtgttgctttatgggcacttggataaacaa 300 

I I I I I I I I 1 I I I I I I I I I I I I I I I I I II ! HI I I I II I M I I II I I I I I I I I I I I I I I I I 
Sbjct: 21320 atggaaattccaggccaaattgatgacactgtgttgctttatgggcacttggataaacaa 21379 

Query: 301 cctgagatgtcaggctggagtgacgatttacatccatggaaacccgtattgaaaaatgga 360 

I M I I I I I I I I I M I i I I II I I I I I I II I I M I I 1 I I ) I I I I I I I I I I ! I II I I I I I I M 
Sbjct: 21380 cctgagatgtcaggctggagtgacgatttacatccatggaaacccgtattgaaaaatgga 21439 

Query: 3 61 ttgttatacggaagaggaggggcagatgatggatattctgcttatgcatcactcacggct 4 20 

I || I || II I II II I I I 1 I I I I II I I I I I I i II I I I II I I II II I I I M I I II I II II I I I 
Sbjct: 21440 ttgttatacggaagaggaggggcagatgatggatattctgcttatgcatcactcacggct 21499 

Query: 421 attcgcgccttggaacagcaaggtttgccatatcctcgttgtatattaatcatcgaagcg 480 

I I II M I M I I I I I I I M M M I M II I II It I II I I I II II I I I II M I I M I I II I I I 
Sbjct: 21500 attcgcgccttggaacagcaaggtttgccatatcctcgttgtatattaatcatcgaagcg 21559 

Query: 4 81 tgtgaggaaagtggcagttacgatttgcctttttatattgagttgctgaaagagcgtatt 54 0 

I I I I I I M I I I I I II M I I I I I II I M M I I I M I I I I I I I I I I II I I I I I M I I I I I I I 
Sbjct: 21560 tgtgaggaaagtggcagttacgatttgcctttttatattgagttgctgaaagagcgtatt 21619 

Query: 541 ggtaaaccatcattggttatttgtcttgattccggagcaggtaattatgagcagttatgg 600 

I I I I I I I M II I I I I 11 M I I I I I I I I I I I 11 I II M II I I M I II I I M II I I I I I I I I 
Sbjct: 21620 ggtaaaccatcattggttatttgtcttgattccggagcaggtaattatgagcagttatgg 2167 9 

Query: 601 atgactacgtcattacgcggtaatttggtcggtaagttaactgttgaattaattaatgag . 660 

I | I I | I ! I M I I II I II I M I II M 1 I I I I M I I II M 1 1 I II II M I I I I I I I I I I I ! I 
Sbjct: 21680 atgactacgtcattacgcggtaatttggtcggtaagttaactgttgaattaattaatgag 21739 
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Query: 661 ^gttcattctgggag^ ^ 
1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mil | | | | | | | I M I I I | I I j | I . , i mi i nil i i 

Query: 721 ttgatcagcaggatagaggacgaaaacaccggagagataaaattacct 780 
Sb 3 ct: 21800 ttgatoagcaggatagaggacgaaaacaccgglglgiili^^i^ii^i^lii^; 21859 

Query: 781 gatattcctgatgag^ 84Q 
«^ M M 11 1 1 1 1 1 1 1 1 ' ' 1 1 1 I I 1 1 I ' I I I I I I I I I I I I I j I I I I I I I I I I I M III 

Sb 3 ct: 21860 ^tattcctgatgagagaataaaacaagcgaaacaargtgiggillii^aggtgil^i 21919 

Query: 841 ^ttatagcgaatttccatggatagattctgocaaac^ 900 
Sb 3 ct: 21920 Stttatagcgaatttccatggatagattctgcciii^ 21979 

Query: 901 ttaatattaaacagaacatggcgccctgccttgacggtgactggtgcagatgggtttcca 960 

'! I I I I M I I I I I I I I I I I I I I | | | | I I I I I I I I I i I I I I I I i I I T I i i i i i 

Sb.ct: 21980 ttaatattaaacagaacatggcgccctgcciigaiggtgiiUgUcigaiggg^iici 22039 

Query: 961 Scgatagctgatgcagggaacgtaatgcgccctgttaegtctttgaaattatccatgcgc 1020 

I I I I I I I I I I I I M I I I I I III I | | | | | | | | I I I I | I I || I I l I i i i i i i 

Sbact: 22040 gcgatagctgatgcagggaacgtaatgcgcccigtrlcgrcrrrglllrtllciltgigi 22099 

Query: 1021 cttccaccactggttgatccagaagcagcttctgttgctatggaaaaagccctgacccaa 1080 
„, . „ „„„ 11 1 11 1 1 1 1 I I I I I I I I I I I I M I II I I I I | | | | | | | | | | | | | | I I I I I , | ,1 | | M | , 
Sb 3 ct: 22100 cttccaccactggttgatccagaagcagcttctgttgctarggllillg^crgiicili 22159 

Query: 1081 aaccctccctataatgcaaaggttgattttaaaatacaaaatggagggtccaagggatgg 1140 

' ' ' ' I I I I I I I I I I I II I I I I I I I I I I I I I I I I | | | I I I I I I I I I III I i I M I I I Tim 
Sb 3 ct: 22160 aaccctccctataatgcaaaggttgattttailiiiiiiiii^^^i^I^^ii^ 22219 

Query: 1141 ^tgctcctttgctttccgattggttagcgaaagcggcatctgaagcatcaatgacttat 1200 

oo™ ' ' 1 ' 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Sb 3 ct: 22220 aatgctcctttgctttccgattggttagcgaaagcggclrctgllgiltclarglirrlr 22279 

Query: 1201 tatgataaacctgctgcttacatgggagaggggggcaccattccatttatgagtatgcta 1260 
Sbicf 22280 Ilia l l ' ' 1 'I' 1 1 1 1 1 1 1 1 1 1 1 I ' I I I I I I I I I | | | | I I I I I I I I I I MM | | Ml I I 
Sb 3 ct. 22280 tatgataaacctgctgcttacatgggagaggggggcaccattccatttatgagtatgcta 22339 

Query: 1261 ^^agcaatttcccaaagcacaatttatgataactggtgttttaggcccccatt 1320 
qh . rl .. „,„ n 1 11 1 1 1 1 1 1 1 1 1 1 1 I ' " I I I N I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | || in I 
Sb 3 ct: 22340 ggcgagcaatttcccaaagcacaatttatgataactggtgttttaggcccccattccaat 22399 

Query: 1321 gctcatggtccgaacgagt^ 

I I I M I I I I I II I I I I M I | | | | | | | M | | M I | I I I I I | I I l M Ml I 

Sbjct: 22400 gctcatggtccgaacgagttcttacatttgga^Ugiiaiaiii^caiciiligigrc 22459 

Query: 1381 tcgtacgttctttatagtttttcacagaaaaaataa 1416 

I I I M 1 II I I I I I I I I I I I | | | | | | | | | | | | | | | , | 
Sb D ct: 22460 tcgtacgttctttatagtttttcacagaaaaaataa 22495 
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TBLASTN 2.2.6 [Apr-0 9-2003 ] 

?d e : V 623 865 " 3 C0NTIG = Conti ^2. P0SCDS1=76674 POSCDS2=777 65 SENS=p, Seq 
(367 letters) 

Database: contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 

Searching . done 



Score E 
(bits) Value 



718 0.0 



Sequences producing significant alignments: 

LpPhiladelphia_Contig4 9 

>LpPhiladelphia_Cont ig4 9 
Length = 376826 

Score = 718 bits (1853), Expect = 0.0 

Jrame^+l = 366/366 (100% >' Positives = 366/366 (100%) 

Query: 1 p^^^^^^^^^^^ ^^ LI KEFRKEGWNVEYIGSVSGIEKEMIEPLDIPFHGV 60 

q ,. , onnnc GNIMSPSIVFTGGGTA GHVTPNIALIKEFRKEGWNVEYIGSVSGIEKEMIEPLDIPFHGV 
Sb 3 ct: 20005 GNIMSPSIVFTGGGTAGHVTPNIAL1 M 2Q184 

Query: 61 fSGKLRRYFSLKNLLDPFKIVLGIIQSSLLFYKIKPDVVFSKGGFVAFPVVVGAWLNRIP 120 

Shirt omp^ ^GKLRRYFSLKNLLDPFKIVLGIIQSSLLFYKIKPDVVFSKGGF^ "° 
Sb D ct: 20185 S SGKLRRYFS LKNLLDPFKI VLG 1 1QS SLLFYKIKPDV^FSKGGFVAFPV^^GAWLNRI P 20364 

Query: 121 WAHESDMSPGLANRLSFPFVNKICLTFDAGKKYFKRQDKIEVTGTPIRQQLLTGNRMKG 1 fin 

20365 20 ;;; 

Sb:ct: 20545 LELCGFNSSKPCLLWGGSLGAGSINSCIRSALKQLTSEFQVIHLCGKGKLDSSLVGVEG 20724 

Query, 241 YCQFEYANEELADLFAASSVVISRAGANSLYEILALGKPHILIPISSQVSRGDQIQNARY 300 

YCQFEYANEELADLFAASSWISRAGANSLYEILALGKPHILIPISSOVSRGDOTOMAR Y 
Sb D ct: 20725 ^QFEYANEELADLFAASSWISRAGANSLYEILALGKPHILlLL™ 20904 

Query: 301 FQGLGISWIQDELLKADVLLQAVQDVMRKKDEIDNKIKALKIESATDKIVAIIKEQAHV 360 
„, . , ono , c FQGLGISVVIQDELLKADVLLQAVQDVMRKKDEIDNKII<ALKIESATDKIVAIIKEOAHV 
8b 3 ct: 20905 FQGLGISWIQDELLKADVLLQAVQDVMRKKDEIDNKi™ 21084 

Query: 3 61 QTPRIV 3 66 
QTPRIV 

Sbjct: 21085 QTPRIV 21102 

Database : contigsLpPhiladelphia 

Posted date: Nov 20, 2003 10:38 AM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 

Lambda K H 

0.321 0.139 0.399 

Gapped 

Lambda K h 

0.267 0.0410 0.140 
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Matrix: BLOSUM62 



Query= 18 ^^^ONTIG^Contig^ P0SCDS1=7 6674 POSCDS2=77765 SENS=p 

home/Gmp/rus] 
igsLpPhilade: _ 

51 sequences; 3,410,887 total letters 



^nt/^ /h0me/Gmp/rusniok/ P r °j ets /legionell a /pourBrevet- 
191103/contigsLpPhiladelphia 



Searching . 



2165 0.0 



done 

Sequences producing significant alignments: T7 ? 

(bits) Value 

LpPhiladelphia_Contig4 9 

>LpPhiladelphia_Contig4 9 
Length = 376826 

Score = 2165 bits (1092), Expect =0.0 
Identities = 1092/1092 (100%) 
Strand = Plus / Plus 



Query: 1 . tg age cc aag ta tt g tt tt ta ccgggggaggaactgccggacatgtaacgcctaatatc 60 

ux« at g a g c ccaagtattgtttttaccgggggaggaactgccggacatgtaacgcctaatatc 20073 

Query: 61 ^ctttgattaaggaatttcgaaaagaaggctggaatgtagaatatatcggctctgtttcc 120 

Sbjct: 20074 gc^ga^agaaaiiiciaaa ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' 1 1 1 ' 1 ' ' 1 1 1 ' ' ' " ' ' 

gctttgattaaggaatttcgaaaagaaggctggaatgtagaatatatcggctctgtttcc 20133 

Query: 121 OTaattgaaaaggagatgattgagccgctggaeattccttttcatggggtcaotaocoot iftn 
Sbjct. 20134 ^aattgaaaaggagatgattgagccgctggacattccttttcatggggtcagtagcggt 20193 

Query: 181 ^attgcgcaggtattttagtttgaagaacttgcttgatcctttcaaaattgttctqaoa 240 

tattCaatCttCttt 5 ctattttata aaatcaaacccgatgtggttttttcaaaa gg t 20313 

Query: 301 ^^"^agcctttcctgtggttgtaggcgcctggttaaatcgaattcctgttgtcg 360 
Sbjct: 20314 ^Sc^S 20373 

Query: 361 "^agtctgatatgagcccaggacttgcgaatcgcctatcctttcctttcg 420 
Sbjct: 20374 ^^1^^^ ^ 



Query: 
Sbjct: 



421 
20434 
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Query: 481 acgggtactccaattcgtcaacagctattaactggaaatcgaatgaaaggattggagtta 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I.I I I I I I I I I I I I I I I | | | | | | | | | | | I I I I 
Sbjct: 20494 acgggtactccaattcgtcaacagctattaactggaaatcgaatgaaaggattggagtta 20553 

Query: 541 tgcggatttaattcctccaaaccttgcctgcttgtagtgggaggaagcttaggggctggt 600 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I j I I || | N I I I I 
Sbjct: 20554 tgcggatttaattcctccaaaccttgcctgcttgtagtgggaggaagcttaggggctggt 20613 

Query: 601 tcaattaacagttgtattcgaagcgcattgaaacaattgacatcagaatttcaagtcatt 660 

I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I [ I I I 
Sbjct: 20614 tcaattaacagttgtattcgaagcgcattgaaacaattgacatcagaatttcaagtcatt 20673 

Query: 661 catctttgtggcaagggaaaacttgattcttcattggttggtgtggagggatattgccaa 720 

I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Sbjct: 20674 catctttgtggcaagggaaaacttgattcttcattggttggtgtggagggatattgccaa 20733 

Query: 721 tttgaatacgccaatgaagagttggctgatctgttcgctgcttcttctgtggtgatttct 780 

I M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 20734 tttgaatacgccaatgaagagttggctgatctgttcgctgcttcttctgtggtgatttct 20793 

Query: 781 cgagcaggagctaattctttgtatgaaatattagcattaggaaaaccacatatcttaatt 840 

I I I I I I I I I I I I 1 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Sbjct: 207 94 cgagcaggagctaattctttgtatgaaatattagcattaggaaaaccacatatcttaatt 20853 

Query: 841 ccaatctcttcacaagtaagcagaggagatcaaattcagaatgcaaggtacttccaggga 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I J I I I I I I 
Sbjct: 20854 ccaatctcttcacaagtaagcagaggagatcaaattcagaatgcaaggtacttccaggga 20913 

Query: 901 ttgggaataagcgttgtgattcaggacgagttattgaaagctgatgttctattacaggca 960 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I f I I I I I I 
Sbjct: 20914 ttgggaataagcgttgtgattcaggacgagttattgaaagctgatgttctattacaggca 20973 

Query: 961 gtacaggacgtaatgcgaaaaaaagatgaaatagataataaaatcaaagcattaaaaatt 1020 

I ! I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Sbjct: 20974 gtacaggacgtaatgcgaaaaaaagatgaaatagataataaaatcaaagcattaaaaatt 21033 

Query: 1021 gagtctgccactgataagattgtggcaattatcaaggagcaagcacatgttcaaacccca 1080 

i m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 i I 

Sbjct: 21034 gagtctgccactgataagattgtggcaattatcaaggagcaagcacatgttcaaacccca 21093 



Query: 1081 aggattgtatga 1092 

MINIMUM 
Sbjct: 21094 aggattgtatga 21105 



TBLASTN 2.2.6 [Apr-0 9-2003 ] 



Query= 2066.5 C0NTIG=Contig4 6 P0SCDS1=567 66 POSCDS2=57173 SENS=p, Seq 
Id : 732 

(150 letters) 
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Database:- contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 

Searching . done 

Sequences producing significant alignments: ^its) Value 

LpPhiladelphia Contiq4 9 

323 4e-90 

>LpPhiladelphia__Contig4 9 
Length = 376826 

Score = 323 bits (828), Expect = 4e-90 

Identities = 149/149 (100%), Positives = 149/149 (100%) 
Frame = +1 y 

Query: 1 IMYLRLLALSALCFVTSPIWSFTCIYTLVKDNCWTDYDVTVDVIEDSTSKTLLTLTAPKG 60 

IMYLRLLALSALCFVTSPIWSFTCIYTLVKDNCWTDYDVTVDVIEDSTSKTLLTLTAPKT 
Sb D ct: 89377 IMYLRLLALSALCFVTSPIWSFTCIYTLVKDNCWTDYDV^^^^ 89556 

Query: 61 KSWARGTFNCEAAEGLRYVAQFSPVFWQNDVGKTYPALRNWYLPAKVNPGDLAWTIPVCF 120 

CK . , onc[r KSWARGTFNCEAAEGLRYVAQFSPVFWQNDVGKTYPALRNWYLPAKVNPGDLAWTIPVCF 
Sb D ct: 89557 KSWARGTFNCEAAEGLRYVAQFSPVFWQNDVGKTYPALRNWYLPA™ 89736 

Query: 121 PADFAQVPFPPNVAGNCKCNFKNIPDPKL 14 9 

PADFAQVPFPPNVAGNCKCNFKNIPDPKL 
Sbjct: 89737 PADFAQVPFPPNVAGNCKCNFKNIPDPKL 89823 

Database : contigsLpPhiladelphia 

Posted date: Nov 20, 2003 10:38 AM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 

Lambda k h 

0.323 0.138 0.473 

Gapped 

Lambda K h 

0.267 0.0410 0.140 

Matrix: BLOSUM62 

Query- 2066.5 CONTIG=Contig4 6 P0SCDS1=5 67 66 POSCDS2=57173 SENS=o 
(408 letters) * 

?^fn^ S: /home/Gmp/rusniok/ P r °jets/legionella/pourBrevet- 
191103/contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 
Searching 



. done 

Score E 



Sequences producing significant alignments: (hill) Value 

LpPhiladelphia_Contig4 9 



809 0.0 



>LpPhiladelphia_Contig4 9 
Length = 376826 

Score = 809 bits (408), Expect = 0.0 
Identities = 408/408 (100%) 
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Strand = Plus / Plus 



Query: 1 9tgactagcccaatttggtctttcacatacatctatar^i-™.H*- 

Sbict- 89419 ' i ' ' ' ' ' ' ' ' ' ' ' 1 1 1 1 1 1 1 ' M M M M M M ??m MUM? 

> - - 419 gtgactagcccaatttggtctttcacatgcatctatactttggtt 



60 

aaagacaattgttgg 89478 



Query. 61 "^^W^^WtKa,:^..^^ 12 „ 

69539 «taccgctceo..a, g „„tc,t 5 g,ct»„ gg t.ctttcaa«,;„,gc^cU« 89596 

6 bJ C 6969, 8 ,6S6 

Query: 241 tacccggcattaagaaattggtatttaccagcaaaagtgaatcctggagatttggcctaa 300 
Sbjct: 89659 iiici^ii^i^i ' ' ' ' ' ""1111,11,,,,, ^ 
CaCCCggcattaa 9 aa attggtatttaccagcaaaagtgaatcctggagatttggcctgg 89718 

Query: 301 a ^ a ^ggtttgttt^ 

S bjCt: 89719 89?78 

Query: 361 ggaaac.gtaa^ 408 
Sb.ct: 89779 ggaaactgtaagtgcaacttcaagiaca^icigii^ill^i^il 89826 



TBLASTN 2.2.6 [Apr-09-2003] 



?d e : Y l433 59,2 C0NTIG = Conti 9^ P0SCDS1=34563 POSCDS2=35318 SENS=p, Seq 
(265 letters) 

Database : contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 

Searching . done 



Sequences producing significant alignments: 

LpPhiladelphia__Contig4 9 

>LpPhiladelphia__Contig4 9 
Length = 376826 

Score = 537 bits (1383), Expect = e-154 

Jrami^+l = 264/264 (100% >' Positives = 264/264 (100%) 



Score E 
(bits) Value 

537 



e-154 
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Sbj 



:„„ ir===ssiSi5= ,„;: 
:n: ::,„ s-s=ss=== .- 

Query; 241 VLFRERKQSWVKIQFEEESDESVF 2 64 

VLFRERKQSWVKIQFEEESDESVF 
Sbjct: 67 8 97 VLFRERKQSWVKIQFEEESDESVF 67 968 

Database: contigsLpPhiladelphia 

Posted date: Nov 20, 2003 10:38 AM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 

Lambda K h 

0.330 0.143 0.468 

Gapped 

Lambda k h 

0.267 0.0410 0.140 

Matrix: BLOSUM62 

Query= 3159.2 CONTIG=Contig4 6 POSCDSl=3 4 563 POSCDS2=35318 S E NS=p 

51 sequences; 3,410,887 total letters 

Searching 

done 

Sequences producing significant alignments: E 

(bits) Value 

LpPhiladelphia_Contig4 9 

>LpPhiladelphia__Contig4 9 
Length = 376826 

Score = 1499 bits (756), Expect =00 
Identities = 756/756 (100%) 
Strand = Plus / Plus 



1499 0.0 



Query: ! . tg tg gaaga ta tt g ta tc ag tt ggca tc gccaaaaaatttttataactacgcgggacgt 60 

S bj ct: 6 721 6 atgtggaagatattgtatcagttggcatcg^ 6?2?5 



Query: 6! ^""ccctggttggcagtcagtgctttgactaccatggccattggtatggtttgggga 120 

„». 67335 ° 
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Query: 121 ttggtatttgctccaccagattatcagcaaggggatgcataccgaattatttttgttcat 180 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | | | | | 
Sbjct: 67336 ttggtatttgctccaccagattatcagcaaggggatgcataccgaattatttttgttcat 67395 

Query: 181 gtacccagcgcttttttatcaatggcattgtatgcctggatggggtttctggccatttta 240 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | | | | | 
Sbjct: 67396 gtacccagcgcttttttatcaatggcattgtatgcctggatggggtttctggccatttta 67455 

Query: 241 ttgttggtgtggcgtatcaaaatggcagggcttttgattcataaggtcgcgcaattaggt 300 

I I I I I I I I I I I I I I I I I I I I I I f I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Sbjct: 67456 ttgttggtgtggcgtatcaaaatggcagggcttttgattcataaggtcgcgcaattaggt 67515 

Query: 301 gcctgcatggcatttcttgctttaattacagggagcatttggggtaaacccatgtggggt 360 

I I I I I I ! I I I I I I I I I I I I II I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I I 
Sbjct: 67516 gcctgcatggcatttcttgctttaattacagggagcatttggggtaaacccatgtggggt 67575 

Query: 361 gcctggtgggtatgggatgcccgcctgacctcagaattaatacttttgttgctctatctg 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Sbjct: 6757 6 gcctggtgggtatgggatgcccgcctgacctcagaattaatacttttgttgctctatctg 67 635 

Query: 421 gcaattctggctacctatcaagcggtaaaaaataaagaagatggagataaaataatagca 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 67636 gcaattctggctacctatcaagcggtaaaaaataaagaagatggagataaaataatagca 67 695 

Query: 4 81 attttagctttggtgggtttaattgatttaccaataattcattattcagtttattggtgg 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 67696 attttagctttggtgggtttaattgatttaccaataattcattattcagtttattggtgg 67755 

Query: 541 aatactttacaccaaggtgcaactttatctgtgtttgccaaacccaaaattgctctcagt 600 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | 
Sbjct: 67756 aatactttacaccaaggtgcaactttatctgtgtttgccaaacccaaaattgctctcagt 67815 

Query: 601 atgttgtatccattgttaatcactttgctgggttttttcttgtattccttatggatcatt 660 

I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M 
Sbjct: 67816 atgttgtatccattgttaatcactttgctgggttttttcttgtattccttatggatcatt 67875 

Query: 661 ttggaaaaagcacgtaatgaagtcttattcagggagagaaagcaatcatgggttaagatt 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 67876 ttggaaaaagcacgtaatgaagtcttattcagggagagaaagcaatcatgggttaagatt 67935 

Query: 721 caatttgaggaagagtctgatgaatcagttttttga 756 

I I I I I I I I I I I I I I I II I I I I I I I I I II I II I I I I I 
Sbjct: 67936 caatttgaggaagagtctgatgaatcagttttttga 67971 



TBLASTN 2.2.6 [ Apr-09-2003 ] 



Query= 4774.1 C0NTIG=Contig4 6 POSCDS1=50654 POSCDS2=50950 SENS=m, Seq 
Id : 2523 
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(103 letters) 

Database: contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 

Searching . done 



. Score E 

Sequences producing significant alignments: (bits) Value 

LpPhiladelphia_Contig4 9 205 4e 

>LpPhiladelphia_Contig4 9 
Length = 37 682 6 



Score = 205 bits (522), Expect = 4e-55 

Identities = 102/102 (100%), Positives = 102/102 (100%) 
Frame = -1 



Query: 1 RRSKMPE I HT LDN P Y I T I LT I FVLAC FVG YY WWKVT PALHT PLMS VTNAI S S 1 1 1 LGAL 60 

RRSKMPE IHTLDN PYIT I LT I FVLACFVGYYWWKVTPALHTPLMS VTNAI S S 1 1 ILGAL 
Sbjct: 83615 RRSKMPE I HTLDN PYIT I LT I FV LAC FVG YY WWKVT PALHTPLMS VTNAI S S 1 1 ILGAL 83436 

Query: 61 IAAGSELIGCITWLGGIAIFITSINIFGGFWTQRMLRMYKK 102 

IAAGSELIGCITWLGGIAIFITSINIFGGFWTQRMLRMYKK 
Sbjct: 83435 IAAGSELIGCITWLGGIAI FITS INI FGGFVVTQRMLRMYKK 83310 



Database: contigsLpPhiladelphia 

Posted date: Nov 20, 2003 10:38 AM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 



Lambda K H 

0.330 0.144 0.442 

Gapped 

Lambda K H 

0.267 0.0410 0.140 



Matrix: BLOSUM62 



Query= 4774.1 CONTIG=Contig4 6 POSCDS1=50 654 POSCDS2=50950 SENS=m 
(297 letters) 

Database: /home/Gmp/rusniok/projets/legionella/pour Brevet- 
191103/contigsLpPhiladelphia 

51 sequences; 3,410,887 total letters 

Searching done 

Score E 

Sequences producing significant alignments: (bits) Value 

LpPhiladelphia_Contig4 9 589 e-1 

>LpPhiladelphia_Contig4 9 
Length = 37 682 6 

Score = 589 bits (297), Expect = e-169 
Identities = 297/297 (100%) 
Strand = Plus / Minus 
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Query: 1 atgcctgaaattcatacacttgataatccttatattacaatattaaccattttcgtactg 60 

i i i i i i i i i i i i i i I i.i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 

Sbjct: 83603 atgcctgaaattcatacacttgataatccttatattacaatattaaccattttcgtactg 83544 

Query: 61 gcctgttttgtaggttattatgtggtttggaaagtaacaccggctttacatacaccccta 120 

I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I It I I 
Sbjct: 83543 gcctgttttgtaggttattatgtggtttggaaagtaacaccggctttacatacaccccta 83484 



Query: 121 atgtcagtaaccaatgccatatccagtattattatacttggtgctttaattgctgcagga 180 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II II I II I I I I I I M I I I I I I I I I I I I I I I 
Sbjct: 83483 atgtcagtaaccaatgccatatccagtattattatacttggtgctttaattgctgcagga 83424 

Query: 181 agtgaattgatcggatgcataacctggttaggtggcatagccatattcattacttcaatt 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I J I I I I I I I I 
Sbjct: 83423 agtgaattgatcggatgcataacctggttaggtggcatagccatattcattacttcaatt 83364 

Query: 241 aatatttttggtggctttgtagtaactcaacgcatgcttcgcatgtataaaaaataa 297 

I I I I I II I I I I I f I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Sbjct: 83363 aatatttttggtggctttgtagtaactcaacgcatgcttcgcatgtataaaaaataa 83307 



Exemple 5: Other examples of alignment fo sequences 



4546.3 (Seq ID 2365) 
3009.2 (Seq ID 1331) 



BLASTN 2.2.6 [Apr-0 9-2003 ] 

Query= Lp Paris Contig48_66441-70516 
(4076 letters) 

Database : /local/http/htdocs/IPF_Gmp/legionella/blastdb/contig 
51 sequences; 3,410,887 total letters 

Searching . done 



Sequences producing significant alignments: 
37 



Score E 
(bits) Value 



1439 0.0 
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Query 



37 



Query 

Scoring 
colors 



i ! i (i i i i i i i i i I r 

588 1868 1588 



I i I I I I 1 I 1 I I I I I i I if I I i I 
2888 2568 3866 3588 4868 



588 



1888 

i i i i 



1588 



2686 2568 3688 

I 1 1 I I I I 1 1 I I 1 1 



3588 4668 

i i i i t i i 



S>8 



S>56 



S>168 



S>158 



S>268 



>37 



Length = 67519 



Score = 1439 bits (726), Expect = 0.0 
Identities = 784/802 (97%), Gaps = 1/802 (0%) 
Strand = Plus / Plus 



Query: 3275 gcaaatatggtaaaaatgttgacattaattagtggagtatgagatatttattttgcgagg 3334 

I I I I I I I I I I I I I I i I I I II I I I I ! I I I I I II II I I I I I I I I I I I 1 I I I I I I I I I I I I I 

Sbjct: 63920 gcaaatatggtaaaaatgttgacattaatcagtggagtatgagatatttattttgcgagg 63979 

Query: 3335 ttggacttgctatgtgtcattccactgggtcaattggcgacacactgattggccctttct 3394 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I [ I I I I I I I 
Sbjct: 63980 ttggacttgctatgtgtcattccactgggtcaattggcgacacactgattggccctttct 64039 

Query: 3395 attatcctgaaatcctgacaagagctctctatggcttaatctataagctgcttgtgatta 3454 

II II I I II III I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 64040 attatcctaaaaccctgacatgaactctctatggcttaatctataagctgcttgtgatta 64099 

Query: 3455 atttcatcgcaatataagccattaaaataccgctaagtaactctattttttgccatactt 3514 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 

Sbjct: 64100 atttcatcgcaatataagccattaaaataccactaagtaactctattttttgccatactt 64159 

Query: 3515 tattttgagttaacaggtttgaaaaataacgagtagtcatcgttaatgaactgaaccaaa 3574 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I i I I I I I I II | I I I I | | I | | | | | | | | | 

Sbjct: 64160 tattttgggttaataggtttgaaaaataacgagtagtcatcgttaatgaactgaaccaaa 64219 

Query: 3575 tcatgctggcagaaattacccctgctaaaaaagccagtttatggtcaggatattgactgc 3634 

M I I I I I I I I I I I I I 1 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Sbjct: 64220 tcatgctggcagaaattacccctgctaaaaaagccagtttatggtcaggatattgactgc 64279 

Query: 3635 tgccgctacccacaatcaccagggtgtctataatggcgtgaggattaagcagactaaacc 3694 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I It I I I I I I I M I I I I I I I I I I I 

Sbjct: 64280 tgccgctgcccacaatcaccagggtatctataatggcgtgaggattaagcagactaaacc 64339 

Query: 3695 ccagggcaaataaaatgatttgcattcttgtatgaggctgatgtgtttctacaacggttt 3754 

I I I II I II I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Sbjct: 64340 ccagggcaaataaaatgatttgcattcttgtatgaggctgatgtgtttctacaacggttt 64399 



Query: 3755 gtttgtttttggacagcgcgctttttaagttttttattgcataataaattaaaaaggcag 3814 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Sbjct: 64400 gtttgtttttggacaacgcgctttttaagttttttattgcataataaattaaaaaggcag 64459 
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Query: 3815 agcctagccataccatccagatttgcaagtttggatgagccaaaagcaattgatgtaaac 3874 

I I I I I I I I I I I I I M I I I I I i I I I I I I I I I I I I I I | III I | | | | | | | | | | | | | | | | | 1 | 
Sbjct: 64460 agcctagccataccatccagatttgcaagtttggatgagccaaaagcaattgatgtaatc 64519 

Query: 3875 cagcgacactgccgcacaccaaaatgacatcacagaaaaagcaaatgacggcagagagag 3 934 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 64520 cagcgacactgccgcacaccaaaatgacatcgcagaaaaagcaaatgacggcagagagag 64579 

Query: 3935 cggcatgattttttcgcgcaccttgcctgataagaaagacattttgcggacctaaggcca 3994 

I I I I I I I II I I i I II 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 64580 ccgcatgattttttcgcgcaccttgcctgataagaaaaacattttgcggacctaaggcca 64 639 

Query: 3995 ttatcaaagataatcccaagagtaatccattaaaataaatcaacataattgcatcaggat 4054 

I II M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 64 640 ttatcaatgataatcccaagagtaatccattaaaataaatcaacataattgcatcaggac 64699 

Query: 4055 agtaaaaaaaaaggcgattata 4076 

I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 64700 agt-aaaaaaaaggcgattata 64720 

Score = 957 bits (483), Expect =0.0 
Identities = 495/499 (99%) 
Strand = Plus / Plus 

Query: 1 ctacaaattttgcaaggttattaaatagtggttttcatctggcggcctattgtttttttg 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I | | | | I | | 
Sbjct: 63429 ctacaaattttgcaaggtaattaaatagtggttttcatctggcggcctattgtttttttg 63488 

Query: 61 ggaaagccataagcattctgccaattgatccatgattaaatgttcaacagccatgggatc 120 

I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I 
Sbjct: 63489 ggaaagccataagcattctgccaattgatccatgattaaatgttcaacagccatgggatc 63548 

Query: 121 ctggtatttttctattaacttggtgtaaacagtacggatgccttgtggcctatcggtcgt 180 

III I I II I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II I I I I 11 I I I I I I I I I I I I I 
Sbjct: 6354 9 ctgatatttttctattaacttggtgtaaacagtacggatgccttgtggcctatcggtcgc 63608 

Query: 181 tacttgttcgcgaacggcaagatgaagccccatatgaaggaagggattggtttcgcctaa 240 

I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Sbjct: 63609 tacttgttcgcgaacggcaagatgaagccccatatgaaggaagggattggtttcgcctaa 63668 

Query: 241 ttcaggataataagtatgttcaggaaaagattgaatttgttcaatgactttatggtattc 300 

I I I I II I I I II I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I II I I I I I I I 
Sbjct: 63669 ttcaggataataagtatgttcaggaaaagatggaatttgttcaatgactttatggtattc 63728 

■Query: 301 cggatgatcaagaatcacttgggcaatttctttttccaagggagaaagttcttttttatt 360 

I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct : 63729 cggatgatcaagaatcacttgggcaatttctttttccaagggagaaagttcttttttatt 63788 

Query: 361 ctggtacttattccagctgataaaaaatagctgtcgagtttcttgtactgtatcgccgta 420 

M I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 1 I 
Sbjct: 63789 ctggtacttattccagctgataaaaaatagctgtcgagtttcttgtactgtatcgccgta 63848 
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Query: 421 aaacataatggcccgattgatataaaatgatccattttaactgaataaaaaagtaacaac 480 

I I I I I I t I I I I I I I I I I I f I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 63849 aaacataatggcccgattgatataaaatgatccattttaactgaataaaaaagtaacaac 63908 



Query: 481 aatgttgatgtgcaaatat 499 

I I I I I I I I I I I I I I I I i I I 
Sbjct: 63909 aatgttgatgtgcaaatat 63927 



Database : /local/http/htdocs/IPF_Gmp/legionella/blastdb/contig 

Posted date: Sep 10, 2003 12:44 PM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 

Lambda K H 

1.37 0.711 1.31 

Gapped 

Lambda K H 

1.37 0.711 1.31 



Matrix: blastn matrix:! -3 



Alignment of a portion of contig48 (Seq Id 48) of the Paris strain with all the contigs of the 
Philadelphia strain. 

The positions of this fragment in the contig are indicated in the line starting with « Query= ». In 
this example, the first position (noted 1 in the alignment) of the fragment is thus position 50760 
in contig48 of the Paris strain. This fragment terminates in position 56335 in contig48. 50760 
should thus be added to the position indicated by the alignment to have the position in the total 
sequence of the contig. The positions in the contig of the Philadelphia strain are unchanged. 
In this exemple, nous pouvons voir que the regions of the Paris strain between the positions 
561(+50760) and 2096(+50760) and between 2622(+50760) and 3981(+50760) are absent from 
the Philadelphia strain. These regions thus contain the following ORFs, specific to the Paris 
strain: 

322.3 (Seq ID 1466) 
321.3 (Seq ID 1462) 
3005.2 (Seq ID -1329) 
5208.1 (Seq ID 2782) 

BLASTN 2.2.6 [ Apr-09-2003 ] 

Reference: Altschul, Stephen F. , Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search 
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Mnr 1 oi r Zir'-iHe: Dpc 9 S • "3 "3 ft Q— *5 4 f) 9 
, IN U.v_ _L fci -L nLlUS -T\t= o . iJ. OJ03 04Ui . 




Query= Lp 


Paris Contig48 50760-56335 
(5576 letters) 




Database : 
Searching 


/local/http/htdocs/IPF_Gmp/legionella/blastdb/contig 
51 sequences; 3,410,887 total letters 
. done 




Sequences 


producing significant alignments: 


Score E 
(bits) Value 


37 




2892 0.0 


Lp 






1 i I 1 1 1 T 
8 25B8 


JDDO 




■ 




LP 


8 2580 


5888 






Scoring 






colors 


S>8 S>58 S>188 S>158 


S>288 



>37 



Length = 67519 



Score = 2892 bits (1459), Expect 
Identities = 1561/1595 (97%) 
Strand = Plus / Plus 



= 0.0 



Query: 3982 cttatggcaaatatttatccctcaaagcgtttctcaataagccattgttaccatgaacca 4041 

I I I ! I I I I I I I I I I I I I I I I I I I I I I MINIMUM] I M M M M I M M M I 
Sbjct: 50163 cttatggcaaatatttatccctcaaaatgtttctcaataagtcattgttaccttgaacca 50222 

Query: 4042 gggtaagctttaattttcttaaacaaaatggagtatttagtcctcccttagatggaagct 4101 

I I I I I I I I I I J I I I I I I I I J M I I I I I I I I I I I I 1 I I I I I I I I I I M M I M M M M 
Sbjct: 50223 gggtaagctttaattttcttaaacaaaatggagtatatagtcctctcttagatggaagct 50282 

Query: 4102 ctttcactgctttgctgagcaaatatttgacgctcctcactgattaattcaatccatttt 4161 

M I M M I M M M M M M I I I I I I I I I I I I I.I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 50283 ctttcactgctttgctgagcagatatttgacgctcctcactgattaattcaatccatttt 50342 

Query: 4162 ttaggtgtcatgtcgttttgtatcaataagttgatcgctgcactaggatattgactttgt 4221 

I I I II M I I I M M II I M I M M M II M M I I M I I M I I I I M M M M I I M I M I 
Sbjct: 50343 ttaggtgtcatgtcgttttgtatcaataagttgatcgctgcactaggatattgactttgt 50402 

Query: 4222 gtaatgacaaaataaaagtaagcggtcaatttgcttctgcataaagcccaaccacttttt 4281 

MINI I I I I I I I I I I I I I I I I I I M I I I I I I I I i I M I I I I I I I I I I M I I I I I I I I I 
Sbjct: 50403 gtaatggcaaaataaaagtaagcggtcaatttgcttctgcataaagcccaaccacttttt 50462 



Query: 42 82 gcgacacattgttgcaaactagcaaattcaatttgatcttctttgactaattgttgcagc 
I I M I I I I II I I M I I I I II II I II I I M I I I I I II I I I I I I I I I I I I I I I I I I M I I I 



4341 
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Sbjct: 504 63 gcgacacattgttgcaaactaggaaattcaatttgatcttctttgactaattgttgcagc 50522 

Query: 4342 ttttgaatcgtgtcagggatacatgcatcaaaaaacaatgtggtcccttgctgtccccat 4401 

I'M II I I I II II I I I I I I I I I I I I I Ml I I I I I M I M I I I li M I M M I II I I M 
Sbjct: 50523 ttttgaatcgtgttagggatgcatgcatcaaaaaataatgtggtcccttgctgtccccat 50582 

Query: 4 402 acgtcacttaatataattctttttaaaatgttcatgtcacaaatttgttgctcatgggga 44 61 

I I M II M M M II I I I M I II M M II I I II M II II M I M II M M I II II I 1 
Sbjct: 50583 acgtcacttaatataattctttttaaaatgttcaagtcacaaatttgttgctcttgcgaa 50642 

Query: 4462 ataggtttctcaattctatgtgttaccggttgccccaacatggaatgtgtccattgctga 4521 

I II I I I I II I II I I I I II I I I I II I I M I II M I I II M I I II I M I I M M I II III 
Sbjct: 50643 ataggtttctcaattctatgtgttaccggttgccctaacatggaatgtgtccattgttga 50702 

Query: 4522 taataagcccgaccgtaaattaatataccacccaaaataaaatcaaaggcagttggggct 4581 

M I M I II I M I I II I II J II M M M I II M II I II II M M I I I M II I M I I II I I 
Sbjct: 50703 taataagctcgaccgtaaattaatataccacccaaaataaaatcaaaggcagttggggct 507 62 

Query: 4582 gcataccaaagagcagtgaagaatttggatttgatgacgtccactttggggtagggatac 4 641 

II II I I I I I I M I I M II M II I II I I I II II I II I I M II II II I I I II II I M M I 
Sbjct: 507 63 gcataccaaagagcagtgaagaaattagatttgatgacgtccactttggggtagggatac 50822 

Query: 4642 tgatggttattggttttaatgatatgtttgcgatcgtattgcatttttaaaggttcatat 4701 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I II ' 
Sbjct: 50823 tgatggttattggttttaatgatatgtttgcgatcgtattgcatttttaaaggttcatat 50882 

Query: 4702 ttaaccaaggatttagcctggtattgattctggccaatacttatcgtttcttccttcttt 4761 

I I II M I I II II II II M II I M II II M I II I II I I M I I I II II I II I II I I M 
Sbjct: 50883 ttaaccagggatttagcttggtattgattctgaccaatacttatcgtttcttccttcttc 50942 

Query: 4762 ggtattcctccatcgactattgctaaatgaaagttatgatttaaaacatgttcataaccc 4821 

II I I I M II II M II I II M I M M M II I M I I M I I M M I I M II II I M I I I I M 
Sbjct: 50943 ggtactcctccatcgactattgctaaatgaaagttatgatttaaaacatgttcataaccc 51002 

Query: 4822 agaaaatgacaatcaggatttttagataaggtatttgctaaatgttgtcgcaatgcctcc 4881 

I I I I II II M II I M II II I I I I M II II M II I II M I I II I I M II I I M I I II I II I 
Sbjct: 51003 agaaaatgacaatcaggatttttagataaggtatttgctaaatgttgtcgcaatgcctcc 51062 

Query: 4882 ccgactttaatattaggatcataaaaatctggatgtttggctaaattcattacccaaaat 4 941 

I II I II M I M I I II I I I II I II I M I I I I II II II II II I I M I II I M I I M I II I II 
Sbjct: 51063 ccgactttaatattaggatcataaaaatctggatgtttggctaaattcattacccaaaat 51122 

Query: 4942 acattgactttattcttttcaccagcaggaatatctcgtaacatcattccccaacaatac 5001 

I I I! II M I II I II I I I I M II II I II II M I II I I II I M II I I M I II II I I M I I I I 
Sbjct: 51123 acattgactttattcttttcaccagcaggaatatctcgtaacatcattccccaacaatac 51182 

Query: 5002 ccaataacttcgtcgtttttatccttggcaataaaacaaatagtctctttataatgtaac 5061 

M I I I I I I I I I I I I I I I I I I M II I II II II I I I I I I I I I I I II I I II I I I I II I I II II 
Sbjct: 51183 ccaataacttcgtcgtttttatccttggcaataaaacaaatagtctctttataatgtaac 51242 



Query: 50 62 attgaagataggacgatatcccctataaaaggcaaatcgccaaaatttgctccatgtgct 
II II I I II I I ! II I I II M II II I It I I II I I II I M I I II M I I I I I M M I I II III 



5121 
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Sbjct: 51243 attgaagataggacgatatcccctataaaaggcaaatcgccaaaatttgctccatgggct 51302 

Query: 5122 aaagaatcgatacgtttaatttcacccattgctttattaagttcagcagtattattggga 5181 

I I t I I I I I I I I i I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I 
Sbjct: 51303 aaagaatcgatacgtttaatttcccccattgctttattaagttcagcagtattattggga 51362 

Query: 5182 tctaaaatctgaacttctttgaagccatagggcttgccttccatcatttttaacgaaaaa 5241 

I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Sbjct: 51363 tctaaaatctgaacttctttgaagccatagggcttaccttccatcatttttaacgaaaaa 51422 

Query: 5242 ttaaaggtacattcgccatctaactttttgtggtgtttaattttaagaggattttgattt 5301 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 51423 ttaaaggtacattcgccatctaactttttgtggtgtttaattttaagaggattttgattt 51482 



Query: 5302 ctgcaatttaacagatctgtccataattgatgaatgtaaagtaatctt teat cat ttggt 
I I I I I I I I I I I [ I I I I I I I I I I J I I I I I 1 II I I I I I I I I I I II I I I I I I I I I I I II I I I 
Sbjct : 514 83 ctgcaatttaacagatctgtccataattgatgaatgtaaagtaatttt teat cat ttggt 



5361 



51542 



Query: 5362 gaggtatttcgcaaattgattaattgctgggtgaaacgatgcacatattctgtatggaga 5421 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I 
Sbjct: 51543 gaggtatttcgcaaattgattaattgttgggtgaaacgatgcacatattctgtatggaga 51602 

Query: 5422 taaatagcatggctactgaaatcaacaacaaactctgccgtcattaattttttgatttcg 5481 

I I I I I I I I I I I I I II I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I II M I II I I I I I I 
Sbjct: 51603 taaatagcatggctactgaaatcaacaacaaactctgccgtcattaattttttgatttcg 51662 

Query: 5482 ggtaagtgatattcaacaggatctgtatatgattttactattgcttctagtgaactccat 5541 

I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 51663 gataagtgatattcaacaggatctgtatatgattttactattgcctctaatgaactccat 51722 

Query: 5542 ttgcgttcaagtacatcgatttttgaatacgacat 5576 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 51723 ttgcgttcaagtacatcgatttttgaatacgacat 51757 



Score = 930 bits (469), Expect =0.0 
Identities = 543/569 (95%), Gaps = 9/569 (1%) 
Strand = Plus / Plus 



Query: 
Sbjct : 



46540 



atggcgaaatcattgtcgcaattagattctgctaatttgctcccctgttttaacatggaa 60 
MINIMI I M II M I II I I I I I M I I M I M I I I I II I M M I I I I II I I I II 
atggcgaaaccattgtcgcaattggactctgctaatttgctcccctgttttaacatagag 4 6599 



Query: 61 caagcagaacgcattggaaaacaaatcaataagctcttacagcatgagttttgcgaggaa 120 

I I I I I I I I I I I I I I I II II I II I I I M I I M I I I I I I I I I I I I II I I I I I I II I I I I I 
Sbjct: 4 6600 caagcagaacgcattggaaaacaaatcaataaactcttacagcatgaattttgcgaggaa 46659 

Query: 121 aacatcaatccaaagaaatttgcctctatcagtcacaatatcctgcccaaaattatgaca 180 

I M I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I II I I II I I I I I I I I 
Sbjct: 46660 aacatcaatccaaagaaatttgcctctatcagtcacaatatcctgcccaaaattatgaca 46719 



Query : 181 gaaacatttttaggagtaaccccgccagaaaactggcagcaattaagcgacgatattata 
I I I I I M I I I I II I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I 



240 
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Sb jct : 46720 *--att t ttaggagtaacccctccagaaaactggcagcaattaagcgacgatattata 46779 

Query: 241 ttt?^???^???????????^?^?-??????? gagctggaagaa 300 
Sbjct: 46780 aaaaactgcatcgcaaacaaaaa^t^ 1 1 II 1 »' I I I I I I I I I I I I I I | MINIM 

y c 9 c aaacaagaatctatgcaaaaaagcagctcgcaaagagttggaagaa 4 6839 

Query: 30! tgca tc --cgagaat tc ctttga tc ctgataca atttggcccctggcttgctcaaaat 360 
46840 tJS^^ 46899 

Query: 361 ffi°«««ttg„t^ 

46900 tgtccacaattgaataagtctt^ 46959 

Query: 421 -^J-"aatgaaaacaaaagtgccgagtaatcgagacaggcttaattacgagagtta 480 
S.ct: 46960 ^=U=^ 47Qi9 

Query: 481 tgcctgataaaaccacatttatttacctaat ,-+■ - 

I I I | | | | | | I I | ||,||,| 7,m ??,??7 ttcatcaaatataactcacc 531 

S bJ « : ,1020 ^ii^^^iiMi teMet ^^-MM t M i 4?o79 

Query: 532 gatatgatctacaaccaagttcttaaaac 560 

I I I I I I I I I I I I I I I I | || | || | | | || | 
Sbjct: 47080 gatatgatctacaaccaggttcttaaaac 47108 

Score = 656 bits (331), Expect =00 
Identities = 423/451 (93%), Gaps = 2/451 (0%) 
Strand = Plus / Plus 

Query: 2172 ^atatggccgataaaatttgccagggtcaataaatagtattctgatggttaaataataaa 22^1 

— — ;z 

Query: 2232 ^gatgaagagttcttttggtgaaatgaataaaagaccccttttttattgagcgactctta 2291 

£:~.EHS^ 2350 

5 c ""9Ctttatcctgtgcttttg gc aagcagtatggtcgcattcaggctt,:g 48105 

s bJ ct, „„, "^ii;^^^.^;i^^i t ^--uiUc;;giii^ui^ ,„„ 

2,11 tt„.t.^^t^...^c..« g t e.gta.gg.ttgococtgta.tcag.t 2,10 
m „ "^^^^i-lali^^U^^ligii^-i^giil^igci ,322. 

°— 2471 ^m??rrnr???n?????nfnr??Trrtnrr?tnfnnr?nfrnrn 2 -° 
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Sbjct: 48226 gggtaagccgattggcttcggcacatctcgaatcagtggcaacctctttcgctttttcat 48285 

Query: 2531 ctttatttcgcataacaatcctgtgaagttaatctttgcagaggacaccatgatggtttc 2590 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I I I i I 
Sbjct: 48286 ctttatttcgcataacaatcctgtgaagttaatctttgcagaggacactatgatggtttc 48345 



Query: 2591 atgtcatacaaacgaagcaaccggataccga 2621 

I I I I I I I I I I I I I I I I I I I I I I I I MINI 
Sbjct: 48346 atgtcatacaaacgaagcaaccgggtaccga 48376 



Score = 147 bits (74), Expect = 9e~35 
Identities = 98/106 (92%) 
Strand = Plus / Plus 



Query: 2097 acctttctggagtttcccaactacaagatgatactgcgttataataactccatttattat 2156 

I I I I I I I I I I I i I I I I I I I I I I I I II I I I I II I I I I I I I I I I II i I I I I I I I I I I I 
Sbjct: 44922 accttcctggcgtttcccaactacaagatgatcctacgttataataactccatttattat 44981 

Query: 2157 actggggctatcgagtatatggccgataaaatttgccagggtcaat 2202 

I I I I I I I I I I I MINIMI I I I I [ I M I II II II I I M M ! 
Sbjct: 44982 gccggggctatcgggtatatggctgataaaatttgccagggtcaat 45027 



Score = 109 bits (55), Expect = 2e-23 
Identities = 64/67 (95%) 
Strand = Plus / Plus 



Query: 3653 cgaatccttacaggaaaacgaagcttatggaagtccaacaaggaagaggtagtaaattca 3712 

I M I I I I I I M I I I I I f I I I I I II I M I I I I 1 I I I I I I I I I I I I I II I I I I I I II I I 
Sbjct: 48378 cgaatccttacaggaaaacgaagcttatggaaggccaacaaggaacagggagtaaattca 48437 



Query: 3713 taacgcc 3719 
II I II I I 

Sbjct: 48438 taacgcc 48444 



Database : /local/http/htdocs/IPF_Gmp/legionella/blastdb/contig 

Posted date: Sep 10, 2003 12:44 PM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 



Lambda 

1.37 



K H 
0.711 



1.31 



Gapped 
Lambda 

1.37 



K H 
0.711 



1.31 



Matrix: blastn matrix:! 



Alignment of a portion of the contig45 (Seq Id 45) of the Paris strain with all the contigs of the 
Philadelphia strain. 
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The positions of this fragment in the contig are indicated in the line beginning with « Query= ». 
In this example, the first position (noted 1 in the alignment) of the fragment is thus position 
44722 in the contig45 of the Paris strain. This fragment terminates at position 52680 in the 
contig45. It is thus required to add 44722 a to the indicated by the alignment to have the position 
in the total sequence of the contig. The positions in the contig of the Philadelphia strain 
unchanged. 

In this example, we can see that the region of the Paris strain between the positions 
1333(+44722) and 6899(+44722) is absent from the Philadelphia strain. This region thus 
contains the following ORFs, specific to the Paris strain: 



4927 . 1 (Seq ID 2623) 
413.5 (Seq ID 2069) 

415.2 (Seq ID 2087) 

417.3 (Seq ID 2102) 



BLASTN 2.2.6 [Apr-0 9-2 003 ] 



Reference: Altschul, Stephen F. , Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapp,ed BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 

Query= Lp Paris Contig45_4 4722-52680 
(7959 letters) 

Database : /local/http/htdocs/IPF_Gmp/legionella/blastdb/contig 
51 sequences; 3,410,887 total letters 

Searching . done 



Sequences producing significant alignments: 



Score E 
(bits) Value 



48 
49 



1943 
94 



0.0 
2e-18 



48 
49 



LP 



Scoring 
colors 



I 1 1 1 1 1 I I 
2580 



s>e 



I I I 1 I I 
5888 7588 



8 2588 

I | | i i i I 



5888 

j i i i i i 



S>58 



S>188 



S>158 



7588 



S>288 



>48 

Length = 263853 
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Score = 1943 bits (980), Expect = 0.0 
Identities = 1040/1060 (98%) 
Strand = Plus / Minus 



Query: 6900 attctaatggtggccagaggcagaatcgaactgccgacacgaggattttcagtcctctgc 6959 

I I I I I I I I I I I I I I I I I I I II i I I I I I I I I I I I I I I I II I I I I I I I I I I i I I I I I I I I I I 
Sbjct: 55098 attctaatggtggccagaggcagaatcgaactgccgacacgaggattttcagtcctctgc 55039 

Query: 6960 tctaccgactgagctatctggccactcaaggcttgctattaaactccgagatcgactttg 7019 

I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 55038 tctaccgactgagctatctggccactcaaggcttgctattaaactccgagatcgactttg 54 97 9 

Query: 7020 agtcaagtgttgattaagttttttttgaaaataattttttttggtaggctatggcaatct 7079 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 54978 agtcaagtgttgattaagttttttttgaaaataattttttttggtaggctatggcaattt 54 919 

Query: 7080 ctgctcctagaaaagcgttgtttttgataagctcaatattagcttccaggctcttgcctg 7139 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
Sbjct: 54918 ctgctcctagaaaagcgttgtttttgataagctcaatattagcttccaggctcttgcccg 54859 

Query: 7140 cagttaattcagcaatacgttttaacaggaagggggttaatgattttccgctcatgtgtt 7199 

I I I I II I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Sbjct: 54858 cagttaattcagcaatacgttttaacaggaagggggttaatgattttccgctcatgtgtt 54799 

Query: 7200 tggcttcttcatgagcctgttttatgtatggactgatttcctcatcagatagttccgctg 7259 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MINIM I I I I 
Sbjct: 54798 tggcttcatcatgagcctgttttatgtatggactgatttcctcatcggatagttcggctg 54739 



Query: 72 60 atactggaatggggtttgcgacgactattccgtttttcatgttcaatttctgttgaattg 
M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I II I I II 1 M I M I I I I II I II I I 
Sbjct : 54738 atactggaatggggtttgcgacgactattccatttttcatattcaatttctgttgaattg 



7319 



54679 



Query: 7320 acatgagatttgctacttcctcgaccgaatttaagcgttgtgggactggtattccactcg 7379 

I I I I I I I I I I II I I I I I I I M II II I I I I I I M M I II I I I II II I I I M M I I I II 

Sbjct: 54678 acatgagatttgctacttcctcggccgagtttaagcgttgtggaactggtattccactcg 54619 

Query: 7380 atctgctgtaaaaagcagggaattcgtctgtggcataacctatgaccggcaccccaaacg 7439 

I I I I II I M M M I I I I I II I I I II I I M I I I II M M I M M I I I II I I I I I M I I II 

Sbjct: 54618 atctgctataaaaagcagggaattcgtctgtggcataacctatgaccggcaccccaaacg 54559 

Query: 7440 tttcaagaacttccaatgtttttggtaagtcgagaatcgattttgcgccagaacagacta 7499 

I I I I I I I I M I I M I I I I I I II I I I I II M II I M I M II II I I II I M I I II II M I II 

Sbjct: 54558 tttcaagaacttccaatgtttttggtaagtcgagaatcgattttgcgccagaacagacta 54499 

Query: 7500 cggtaactggcgtattggatagttctataagatcagctgaaatatcaaaactcattgtca 7559 

I I M I I I I I II I I I I I II I M I M I I I i I I I I M I I M I I I M I I I I I I I M I I I I I I 

Sbjct: 544 98 cggtaactggcgtattggatagctctataagatcagctgaaatatcaaaactcatcgtca 54439 



Query: 7560 cgtcttgatgaacaccacctatacctccggtgacaaatagggggagcctagccatatggg 7 619 

I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Sbjct: 54438 cgtcttgatgaacaccacctatacctccggtgacaaatagggggagcttagccatatggg 54379 
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Query: 7620 "^^^^^^ctgctacogttgtgctggcagtcactttacgagataatacaaaag 7679 

11 M ' 1 I I I I I I! I I I I M II I I I I I I I I I I I I I I I I I I i i ii i i i i i i i i i i i 7 
Sb.ct: 54378 cacaaaacatggttgctgctaccgtcgtgiUgii^^^iiiii^iiii^^^^ ^ 



Query: 7680 a ^^tctctgcgagaggcttttattacttctttttgcagcgcgagatgct^ 7739 
<!M^. rjho 1 11 ' 1 ' 1 1 1 1 1 1 1 1 1 1 I I I I I I I II I I I I I | I | | | | | I HIM || in || || MINI 
Sbuct: 54318 aaatgtctctgcgagaggcttttattacttctttttgaagcgcglgatgtticaraaitr 54259 

Query: 7740 Ct )^^ 7799 
„ h ^, , I ' I I I I I I I I I | | | | I I M I I I I I I I I | | | | | I I I I I I I I I I I I I | | | | | | I | | I | | I | 

Sb:ct: 54258 cttgagttaaaccaacacggattttcccttggtgcatcgctatagtagctggaarggigi 54199 

Query: 7800 cttgtctacgaataatattttcaacttctattgccgtagttaaattatcagggtagggca 7859 

I I I I I I I I I I I I I I I I I I I N I I I I I I I I I I I I I I I I I I I II I I Ml I I I I I I 

Sb D ct: 54198 cttgtctacgaataatattttcaacctctattgccg^ 54139 

Query: 7860 ttccatgagagataatggtcgactcaagagcaacaattgggtttttatcattgatagcat 7919 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I I I I I I I I I I I | | I I I I I I | I | | | | I I I I I I II I I I I I i i i i 
Sb^ct.- 54138 ttccatgagagataatggtcgactcaagagclliaartgggi.ittiioatrglrlgclt 54079 

Query: 7920 ccagtacttcttcgttaaattccaacaagtcatgaaacat 7959 
„ h . . 11 I' I III I MM II Mill II MM III I II II II I III 

Sb D ct. 54078 ccagtacttcttcgttaaattccaacaagtcatgaaacat 54039 



>49 

Length = 376826 

Score =93.7 bits (47), Expect = 2e-18 
Identities = 102/119 (85%), Gaps = 1/119 (0%) 
Strand = Plus / Plus 



Query: 1215 atttgccctgtgtattgtttagtgttggtcgagcggttcactctctgttgaaacccggta- 1274 

I I I I M I M I I I I II I I II I I I II II I | | || || | | | | | | | | i I I I I I 
Sb:ct: 207683 atttgccctgtctttggtttagtgttgaacgagcggctcactcaaagalgaaicctggcc 207742 

Query: 1275 aaa-ccgtaaagctcgaagaagggggcaaatcaatcgttataaggcaaacgatcccgcc 1332 

q ^^. 9n ^,, 111 11 'I I' I UN HI Ml MIM III III II I I I i | | I | | J | | | Mi, 

Sb 3 ct: 207743 aaagccataaagctcgaagtagggggcaaatcaatcgttataaggcaaacgatctcgcc 207801 

Database: /local/htt P /htdocs/IPF_Gm P /legionella/blastdb/contig 
Posted date: Sep 10, 2003 12:44 PM 

Number of letters in database: 3,410,887 

Number of sequences in database: 51 

Lambda K h 

1-37 0.711 1.31 

Gapped 

Lambda K h 

1-37 0.711 1.31 



Matrix: blastn matrix:! -3 
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Alignment of a portion of contig39 (Seq Id 39) of the Paris strain with all the contigs of the 
Philadelphia strain. 

The positions of this fragment in the contig are indicated in the line starting with « Query= » In 
this example, the first position (noted 1 in the alignment) of the fragment is thus position 3990 in 
eontig39 of the Paris strain. This fragment terminates at position 8972 in eonti g 39. 3990 should 
this be added to the position indicated by the alignment to have the position in the total sequence 
of the contig. The positions in the contig of the Philadelphia strain are unchanged. 
In this example, we can see that the region of the Paris strain between the positions 1264(+3990) 
and 4465( + 3990) is absent from the Philadelphia strain. This region thus contains the following 
ORFs, specific to the Paris strain: 



3396.1 (Seq ID 1588) 

3395.2 (Seq ID 1587) 
3394.1 (Seq ID 1586) 



BLASTN 2.2.6 [Apr-0 9-2003 ] 

Query= Lp Paris Contig39_3990-8972 
(4983 letters) 

Database : /local/http/htdocs/IPF^Gxnp/legionella/blastdb/contig 
51 sequences; 3,410,887 total letters 

Searching . done 



Sequences producing significant alignments: 



38 
49 



38 
49 



Lp 

Scoring 
colors 



>38 



Score E 
(bits) Value 

2123 0.0 
638 0.0 



8 580 ieee ' ' isbb ' ' 2888 ' ' mob ' ' sees ' ' 3508 ' ' 4B00 ' ' 4500 

? , ■ 1 PW , , l 8 ?? , , ^88 2888 2588 3888 3580 4888 45BB 



S>8 



S>188 



S>158 



S>288 



Length = 17003 
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Score = 2123 bits (1071), Expect =0.0 
Identities = 1215/1263 (96%) 
Strand - Plus / Minus 

Query: 1 ^"ggttctacatgagcttgcctgagatgtttgcccttatcgattgcaacaatttttac 60 

Sbjct- 12277 iiiiil/i'I 1 "! 11 IMM " Mlllllll MM, MUM | | ,,,, |M 

Sbjct. 12277 ctttggttctacatgagcttgcctgagatgtttgcccttatcgattgcaacaatttttac 12218 

Query: 61 gccagttgtgagcgtttgtttcgtcctgatttaaaggacgtccccatcgtggtgctatcc 120 
Sbjct. 12217 gccagttgtgagcgtttgtttcgtcctgatttaaaggacgtccccatcgtggtgctatcc 12158 

Query: 121 ^taatgatggctgttgtatcgcacgctcgaatgaagccaaagcattgggcattgccatq 180 

sbict- 12157 lit" 1 ' ' " • " " 1 • 1 1 " 1 " " u 1 1 ( i n 1 1 1 1 n iTi 1 1 1 ????? TT ????TT 

Sbjct. 12157 aataacgatggctgttgtatcgcacgctcgaatgaagccaaagcattgggcattgccatg 12098 

Query: 181 ^cgagccgtacttcaaaattaaaca^ 24Q 
Sbict • i?nc7 '''''''''''' ' I I I I I I I I I I I | | | | | | | | | | | | | | | | j | | I I I I 

Sbjct. 12097 ggcgagccgtacttcaaaattaaacatttgtgcaaacagcatggagtglllgctititcc 12038 

Query: 241 ^^^"^acgctgtatggcaacatgagtcatcgtgtgatgtgcactattgaagaagcc 300 
Sbjct: 12037 tcaaattatacgctgtatggcaacatgagtcatcgt^g^gi^ili^^^^i 11978 

Query: 301 ^^catatagaagt^ 

SM... I 111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I II I I I I I I I I M I I I I I I I | | | | | | | | | | | | | | | | , , 

Sbjct. 11977 tggccccatatagaagtttactcgattgatgaagcgtttcttgatttaagglgtiiiccg 11918 

Query: 361 ^^^^tgattcgttttgcgagcagttaca 420 
Sbjct: 11917 gttgatagctatgattcgttttgcgagcagUicalilgiili^iUiagcaciclgga 11858 
Query: 421 atacccacttccatcggtattggacctactaaaacactagctaaagccgccaatcattta 480 

' ' ' ' ' ' ' I I I I I I I I I I I I I I I I I I I I I I I I I | I I I II I | II II I I I I I I I I 

Sbjct: 11857 atacccacttccatcggtattggacctacialliclcilgciaaagccgcclaiciitia 11798 
Query: 4 81 tgcaaaaaagtttataaaatccctgtgtttaatatcacctcgaatcgtgggcggttattg 54 0 

' ' ' 1 1 1 1 1 1 1 11 1 1 1 1 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I Ml I 7 

Sbjct: 11797 tgcaaaaaagtttataaaattcctgtgttiaiiliciccicgaaicgiggg^iliU 11738 



Query: 541 "acagatttccgttggggacatttggggagtagggcggcaatgggccaataaattaatt 

M 1 1 11 1 1 1 1 ' 1 1 1 I I I I I I | | J | | | | I I I I | I M I I | I I I . | | I | , , | mi,,,,, 

Sbjct: 11737 caacagatttccgttggggacatttggggigiig^^iil^^lli^ii^];; 11 678 



600 



Query: 601 J^cgaggcattcatacggcttatgatttggcaatgaccaatcctcaccttctgaagaaa 660 
Sbjct: 11677 tcgcgaggcattcatacggcttatgatttggcaatgaccaatcctcaccttcigaaglil 11618 



Query: 
Sbjct: 



661 
11617 



720 
11558 
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Query: 721 ttagaggcaatagagcctaagcaaagtattatgtcatctaaaagttttggtcagatgcaa 7 80 

' f I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I ! 

Sbjct: 11557 ttagaggtaatagagcctaagcaaagtattatgtcatccaaaagttttggtcagatgcaa 11498 

Query: 781 actcaacttgcttcgattgaggaatcaatcagtagccattgtgcccgtgcggtggagaaa 840 

MINI I I I I I I I I I I II I I 1 I I I I I I I I I I I I I I I I | | | | | | I 1 I I I I I I I I I I I I I 
Sbjct: 11497 actcaaattgcttcgattgaggaatcaatcagtagccattgtgctcgtgcggtggagaaa 11438 

Query: 841 ' atgcgtcgccagcaattggtggcgaagcgcctggttgtatttgtgcatacgaaccgattt 900 

I I I I I I I I I I I I f I I 1 I I I I I I I I III I I I I I I I I I I I I II I I II I 1 I I I I I I I I I I 
Sbjct: 11437 atgcgtcgccaacaattggtggcgacgcgtctggttgtatttgtgcatacgaaccgattt 11378 

Query: 901 cgcgaagatttggcacagcactttcagtccatcgaatttaagctgattaatcctacagat 960 

M MINIMI! I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I 1 I II I I I I I I I I I 
Sbjct: 11377 cgggaagatttggctcagcactttcagtccatcgaatttaagctgattaatcctacagat 11318 

Query: 961 gatttgcgcttaattaccaaaatggccaagcgatgtctgcaacgcatttttaaaccaggg 1020 

I I I I I I I I I I I I I I I I I I I I I I I Mill I I I I I I I I I I I I I I I I II I I I 1 I I I I I I I 
Sbjct: 11317 gatttgcgcttaattaccaaaatagccaaaagatgtctgcaacgcatttttaaaccaggg 11258 

Query: 1021 tattactataaaaaggcaggagtatgtcttgaggacttaattcccaaaaacccacgacag 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I MINIMI 
Sbjct: 11257 tattactataaaaaggcaggggtatgtcttgaagacttaattcctaaaaaaccacgacag 11198 

Query: 1081 ctggatatgttttatcaaccaagtgacgagcatctaaaccacacggaacaattgatggcg 1140 

M I II M I I II I I I I I II I I I II I I II M I II II I I I I II 1 M I I II II II I I I 
Sbjct: 11197 ctggatatgtttcatcaaccaagtgatgagcatctaaaacacaccgaacaattgatgggt 11138 

Query: 1141 gtctttgaccaaatcaatcaaaaatacggacgaagtacaatccgcctcgcggcagagggt 1200 

II M I I I I M I I I I I M II M 11 II III I I I I M II I I I II II II M 
Sbjct: 11137 gtctttgaccaaatcaatcaaaagtatggaagaagcacgattcggttagccgccgaaggc 11078 

Query: 1201 tattcaaaaccttgggcgatgcgtgctgaactgaaatcgcctgcctataccacacgatgg 1260 

II II II I II II II I I II I M I II I II I I II II II I Mill I II I II M II II I I 
Sbjct: 11077 tattcaaaaccctgggagatgcgtgctgagctgaaatcacctgcttataccacgcgatgg 11018 

Query: 1261 tct 1263 
I I I 

Sbjct: 11017 tct 11015 



>49 



Length = 376826 



Score = 638 bits (322), Expect = 0.0 
Identities = 469/518 (90%) 
Strand = Plus / Plus 



Query: 4466 gaacaataatcactgataaaaagatcttgagcaaaagtctcaaaatcaaaatagcagatc 4525 

II I M II I'l I I I I I I M I I II II M II II I II I I I M II I M I I II II I I I i I 11 
Sbjct: 203775 gaacaataatcactaataaaaagatcccgagcaaaagcctcaaaatcaaaatagcagctc 203834 



Query: 4526 aaactatccggcatttgatgggcataacagtcatcaaataattgcctggcaaaatcaacc 4585 
II II I M II II M I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II II II II I II I M 
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Sbjct : 203835 aaactatccggtatttgatgggcataacagtcatcaaataactgtctggcaaaatcaacc 2038 94 

Query: 4586 tctgaatcataacaaccttggtaatgatcctccaacatggtttgtgcatcatctaccgag 4 645 

I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I j I I I I I I I MINIM 
Sbjct: 203895 tctgaatcatagcaaccttggtaatgatcctccaacatggtttgtgcatcttctaccgaa 203954 

Query: 4 64 6 taatcacagagcagggctaatcctaactctccgtgttcttgaatgaatgaagcatattcc 4705 

M M I I M i I M M M M MM M M M M M M ! M M M M M M M I Ml 
Sbjct: 203955 taatcgcagagaagggctaagcctagctctccatgttcttgaataaatgaagcatactcc 204014 

Query: 4706 acaatgttgcttattccttcatattcatgaattctcatgctgccgaaaccttcataatcg 4765 

I I II I II I II M I j I II Mill III I II M II M I I I II II I I I II I M 
Sbjct: 204015 acaatattacttattgactcgtactcatggattttgatgctgccgaatccttcataatcg 204074 



Query: 4 7 66 
Sbjct: 204075 



tggatggcaaattcctcagcattggtttctgggctattatccaacatttcccagatttct 4 825 
II I I I I I I MM II II I I M I I II Mill I II I II I II II M M I II II II I 
tgtatggcatattcttcagcattgggctcagggctgttatccagcatttcccagatttct 2 04134 



Query: 4826 ttcatgatgtcatcttcgctttgagtagcatctatccatacaccatgcagtatggcgttg 4885 

i III I M II M II M M I II I II I I II I I M M I II II II I 1 I I I I II II Ml 
Sbjct: 204135 tccataatgtcatcttcgctttgtgtggcatctatccaaacaccatgcaggatggcattg 204194 

Query: 4886 ttgtaagaggctaaacaggcgacgtagattgaaggggtgtccatgggattatctccttgt 4945 

Mill I II II II I II J I I II I I I I I M I II I II I II I II I 1 I I II II II I I I II II I 
Sbjct: 204195 ttgtatgaggctaaacacgcgacgtagattgaaggggtgtccatgggattatttccttgt 204254 

Query: 4 94 6 attaagggagctatcccacacgggagcttgctcccgtg 4 983 

M I I I M II II I II I M I M M M I I M II I I II M 
Sbjct: 204255 attaagggagcaatcccacacgggagcttactcccgtg 204292 

Database : /local/http/htdocs/IPF_Gmp/legionella/blastdb/contig 

Posted date: Sep 10, 2003 12:44 PM 
Number of letters in database: 3,410,887 
Number of sequences in database: 51 



Lambda K H 

1.37 0.711 



1.31 



Gapped 
Lambda 

1.37 



K H 
0.711 



1.31 



Matrix: blastn matrix:! -3 
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Example 6: RESULTS 

See hereinbelow Tables V, VI, I and II hereinbelow. 

Example 7 : Sequencing and annotation of the genome of L. pneumophila Lens strain 

Comparison of the sequences of the genomes of L. pneumophila Paris strain, 
5 Lens strain and Philadelphia strain 

( http://genome3xpmc.columbia.edu/-legion/index.html \ three strains of serogroup 1, 
shows that around 88% of these genomes are very strongly preserved (95 to 100% of 
proteic identity), whereas the remaining 12% are specific to each strain. These results 
suggest that there is a large genomic diversity at the very centre of the L. pneumophila 
1 0 species. 

The Table XVI hereinbelow comprises for each of the ORFs identified in the 
Lens strain its position on the genome, the existence of a peptide signal, the best result 
of the blast on nrprot (Best-Blastp). The ORFs specific to the strain L. pneumophila 
Lens relative to the strain L. pneumophila Paris were identified in considering as 

15 specific the ORFs having a percentage of proteic homology less than 75%. In the case 
where the ORF is preserved in the two genomes, the percentage of homology between 
the two proteins is mentioned. Finally, the ORFs specific to the Legionella genre were 
identified in considering as specific the ORFs having a percentage of proteic homology 
with sequences of the bank nrprot less than 25 %. 

2 0 In conclusion, these results help define DNA probes for developing a typing 

tool. The utilization of this tool on a large number of strains isolated from patients and 
strains isolated from the environment can confirm if this tool can predict the risk 
associated with a strain by definitely discriminating the strains isolated from patients of 
the other strains. 

25 

Material supplied: 

• The complete sequence of the genome of Legionella pneumophila Lens strain made 
up of the long chromosome of around 3.33 Mb and of a long plasmid of around 60 
kb. 

30 • A list of specific coding phases of L, pneumophila Lens strain annotated with their 
nucleotidic sequences. 
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Materials and methods. 

1 . Construction of the shotgun bank of small fragments (size 1 .5 to 2.5 kb) 

The chromosomal DNA of the strains studied was prepared by a classic method 
including proteinase K treatment and phenol extraction (9). Around 40 ug of DNA were 
5 broken by nebulization (1 minute under pressure of 1 bar) (4). The ends of the 
fragments of DNA were rendered free by having the DNA-polymerase of the 
bacteriophage T4 act for 15 minutes at 37°C in the presence of the 4 tri-phosphate 
nucleotides. The enzyme was inactivated by incubation of 15 mn at 75°C. Adaptors 
(invitrogen Cat. N° 408-18) have then been ligatured to these ends. After ligature, the 

10 fragments of chromosomal DNA having a size between 1500 and 2500 base pairs were 
purified after electrophoresis on agarose gel. The vector utilized for construction of the 
bank, pcDNA2.1 (Invitrogen), was digested by the enzyme BstXl and purified by 
geneclean (BIO-101) after electrophoresis on agarose gel. The chromosomal DNA and 
the purified vector were ligatured by action of the ligase of the bacteriophage T4. The 

1 5 ligation mixture was introduced by transformation in the strain of Escherichia coli XL2- 
blue (Stratagene). About 4000 colonies are obtained per ul of the ligation mixture. 

2. Preparation of plasmids and sequencing 

The plasmids were prepared from bacterial colonies with the «TempliPhi DNA 
sequencing template amplification» kit marketed by Amersham Bioscience. The 
2 0 chromosomal inserts were sequenced from their two ends by utilizing the universal 
primer T7 by following the recommendations of the supplier (Applied-Biosy stems). The 
sequences were determined by utilizing automatic sequencers of type 3700 (Applied- 
Biosystem). 

3. Assembling of sequences 

2 5 The sequences were assembled by utilizing the set of software developed at the 

University of Washington, Phred, Phrap and Consed (5, 8). The sequence was finished 
by utilizing the set of CAAT-box software (7). The finishing stage corresponds to 
resequencing the regions where the sequence is only slightly secure and sequencing of 
the regions situated between the contigs. It was done either by sequencing PCR products 

30 or by operating on the clones of the bank. The oligonucleotidic sequences were defined 
by utilizing consed and Primo software (8, 10). 
3. Annotation of sequences 

The identification of the coding phases (CDS) was completed by utilizing the set 
of CAAT-box software (7). This program combines the results of different methods: (i) 
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identification of open reading phases and their tri as a function of their size, (ii) analysis 
of the probability of being coded by utilizing Genemark software (11), (iii) 
identification of a start in translation (initiation codon and fixing sequence of the 
ribosome), (iv) similarity of the proteic sequence deduced with the proteic sequences 
5 contained in the sequence banks by utilizing BLASTP software. 

The functions of the proteins coded by the coding phases identified were 
predicted by analysis of the research results of similarities in the non-redundant bank of 
the NCBI (http://www.ncbi.nlm.nih.gov/BLAST/) by utilizing BLASTP software (1). 

4. Comparison of the genomes - identification of the CDS specific to the strain of L. 
10 pneumophila Paris strain 

The set of proteic sequences deduced from the predicted coding phases each 
genome was compared to the set of proteic sequences possibly coded by the other 
genome by utilizing BLASTP software. A threshold of 75% of identity on the totality of 
the length of the protein was retained for identifying the proteins specific to an isolate. 

15 This very high value was kept since it best allows the orthologous genes of the 
paralogous genes (6) to be discriminated. For the proteic sequences for which the 
sequence preservation is high (> at 70%) the preservation of the nucleotidic sequences 
of the genes will also be high and could give a signal in hybridization conditions of low 
stringency. It will be necessary to take into consideration this eventuality in the analysis 

20 of the test result. 

5. Examples of annotations 

5.1. Genes specific to L. pneumophila Lens strain. There is no significant similarity 
between the nucleotidic sequence of the gene of L. pneumophila Lens strain and the 
genome of L. pneumophila Paris strain. 



25 



ID of 
L. pneumophila Lens 
strain gene 


ID of 
L. pneumophila Lens 
strain gene 
(best score) 


% of identity of 
proteic sequences 


% of identity of 
nucleotidic sequences 


2795.1 








560.1 








116.1 








3866.1 


2661.2 


26% 




2141.1 


152.3 


24% 


not significant 
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5.2. Genes common to the two strains for which the similarity (identity) of the deduced 
proteic sequences is less than 75% and value of the similarity at the nucleotide level. 



ID of 
L. pneumophila Lens 
strain gene 


ID of 
L. pneumophila Paris 
strain gene 
(best score) 


% of identity of 
proteic sequences 


% of identity of 
nucleotidic sequences 


2518.1 


5987.2 


42% 


32% 


3820.1 


3661.4 


42% 


15% 


5.4. Genes common to L. pneumophila Lens strain and Paris strain. 


ID of 

L. pneumophila Lens 
gene 


ID of 
L. pneumophila Paris 
gene 
(best score) 


% of identity of 
proteic sequences 


% of identity of 
nucleotidic sequences 


795.1 


3838.3 


99% 


98% 


2457.1 


3282.3 


100% 


98% 
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Example 8: Proof in the genome of Legionella pneumophila of the exploitation of 
1 5 functions of the host cell and of the high genomic plasticity 

Legionella pneumophila, the causal agent of legionnaires disease, replicates in 
the form of an intracellular parasite of the amoeba and persists in the environment in the 
form of a free living microbe. Analyzed here are the complete genomic sequences of L. 
pneumophila Paris (3 503 610 bp; 3 077 genes), an endemic strain predominant in 

2 0 France, and L. pneumophila Lens (3 345 687 bp; 2932 genes), an epidemic strain 
responsible for a major epidemic in France. A striking characteristic of the genome of L. 
pneumophila is its plasticity. Three different plasmids were identified, and -13 % of 
each genome is different to the other strain. The Paris strain codes for a unique secretion 
system of type V, and its secretion system Lvh of type IV is coded by a region of 36 kb 

25 which can be either carried on a multicopy plasmid or be integrated into the 
chromosome. The genetic mobility can be a mechanism which increases the multiplicity 
of L. pneumophila. A large number of genes codes for proteins or patterns of eukaryotic 
type provided to modulate the functions of the host cell to the advantage of the 
pathogen, comprising the repeated sequences of tetratrico peptide, ankyrin, F box, serin- 

30 threonin kinase proteins, apyrases and a sphingosine-1 -phosphate lyase. Therefore, the 
genome reflects the history and the lifestyle of L. pneumophila, a human pathogen of 
the macrophages which has co-evolved with soft-water amoeba. 

L. pneumophila is the causal agent of the legionellosis, an atypical pneumonia, 
which can be fatal if it is not rapidly treated 1 . This Gram-negative facultative 
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intracellular pathogen can at the same time adapt to the aquatic environment and to the 
intracellular medium of the phagocytary cells of the human host 2 . When inhaled in 
contaminated aerosols, L. pneumophila can reach the alveolae of the lungs where they 
swamped by macrophages. By opposition to the majority of bacteria, which are 
destroyed, L. pneumophila can multiply within the phagosome of the macrophage and in 
the end kill off the macrophage, resulting in legionellosis 3 . 

L. pneumophila and other legionelles are inhabitants of natural aquatic biotopes 
and artificial aqueous systems, such as the cooling towers of air conditioners 3 . 
Legionelles were detected by culture in soft-water environments at 40 % and by PCR in 
soft-water environments up to 80 %, where the bacteria are known to survive and 
replicate by intracellular means in free living protozoons, often within aquatic biological 
films 4 . Its capacity for exploiting the base cellular mechanisms of a large spectrum of 
protozoic eukaryotic hosts likewise allows legionelles to infect human cells 5 . In fact, it 
was shown that the capacity of L. pneumophila to multiply by intracellular means'in 
amoeba contributes to the disease, even though little is known of things about the 
mechanisms governing the host-microbe interactions. Inversely, the biphasic life cycle 
of L. pneumophila, the changing of replicating parasitic cells into transmissible 
extracellular forms and the complex regulatory network, which governs these changes, 
are understood in part 6 . 

The Legionella genre comprises 48 species, but more than 90 % of the cases of 
clinical legionellosis are caused by L. pneumophila even more arresting, up to 84 % are 
caused by the serogroup 1 of L. pneumophila 1 . We have determined the complete 
genomic sequences of two clinical isolates of the serogroup 1, the Paris and Lens 
strains, to provide knowledge of the genetic characteristics of L. pneumophila, and to 
identify the properties which were selected in niches specific to the pathogenicity and 
the life cycle of L. pneumophila. The Paris strain is the only endemic strain known to 
date, accounting for 12.7 % c f the cases of legionellosis in France and for 33 % of those 
occurring in the Paris region 8 . It is associated with the nosocomial and community 
diseases occurring in the form of epidemics or sporadic cases. From November 2003 to 
January 2004 the Lens strain caused an epidemic of 86 cases resulting in 17 deaths in 
the north of France, suggesting that it is particularly efficacious for causing the disease 
in humans. The genomic comparatives of an endemic isolate and of an epidemic isolate 
supplying the bases for comprehending the specificity of strain, and can give indices for 
the particular adaptability and stability of the Paris strain. 
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Results 

General characteristics 

The Paris strain and the Lens strain of L. pneumophila each contain a circular 
chromosome of ~3 503 610 bp and ~3 345 687 bp, respectively, with an average GH-C 
5 content of 38 % (Table XXII, Figure 1, Gen-Bank/EMBL access numbers CR628336, 
CR628337). L. pneumophila Paris separates a plasmid of 131 885 bp and the Lens strain 
contains another plasmid of 59 832 bp (Gen-Bank/EMBL access numbers CR628338, 
CR628339. In the chromosome of L. pneumophila Paris, 3077 genes were identified and 
2 932 in that of L. pneumophila Lens. No function was able to be predicted for 42.1 % 
10 (1354) of the genes of L. pneumophila Paris and 44.1 % (1320) of the genes of L. 
pneumophila Lens, a proportion similar to that found in the majority of the other 
bacterial genomic sequences. A high proportion of the genes provided (21 % for the 
Paris strain, 20.4 % for the Lens strain) is unique to the Legionella genre and can thus 
code the specific functions of Legionella. 

15 

Exploitation and modulation of the functions of the host cell 

A fascinating question is to know how Legionella to decompose the functions of 
the host to enter, survive, replicate and evade amoeba or alveolar macrophages. Within 
its genome L. pneumophila codes for an abundance of proteins of eukaryotic type. In 

2 0 effect, 30 have the highest similarity with eukaryotic proteins (Table XXIII) and 32 
genes code for proteins with eukaryotic domains implied in protein-protein interactions 
(Table XXIV). We reveal here proteins provided for diverting eukaryotic regulatory 
paths or for being secreted in eukaryotic cells, making strong candidates of them for 
directing the invasion of Legionella, for traveling in the host cell, or modulating or 

2 5 being subtracted from functions of the host cell. 

The repeated sequences of tetratrico peptide (TPR) are repeated patterns of the 
34 degenerated amino-acids present in networks in tandem of 3 to 16 patterns, which 
form hooks for facilitating the protein-protein interactions. The TPR proteins contribute 
to control of the cellular cycle, to repression of transcription, to response to stress, to 

30 inhibition of the protein kinase, to the transport of mitochondrial and peroxisomal 
protein and to neurogenesis 9 . The Sel-1 repeated sequences represent a sub-family of 
TPR sequences. In L. pneumophila five proteins containing Sel-1 domains were 
identified. Two of them (EnhC and LidL) were previously implied in interaction with 
the host cells or in precocious signaling of events which regulate the scheduling 
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decisions of L. pneumophila in the macrophages 10,11 . As a consequence, the three newly 
identified proteins are likewise in al likelihood implied in the host-pathogen 
interactions (Table XXIV). 

After internalizing, L. pneumophila manipulates the endosome-lysosome 
5 degradation path of the host for surviving and replicating within a vacuole derived from 
the endoplasmic reticulum (ER). A protein of L. pneumophila, RalF, thought to 
contribute to recruitment of the ER contains a eukaryotic domain Sec 7. RalF, a 
substrate of the secretion system of type IV is required by the regulatory protein ARF 
for associating with the phagosomes of L. pneumophila 12 . The two strains of L. 

10 pneumophila code for three serin/threonin kinase proteins of eukaryotic type (STPK) 
(Table XXIV). The multisequence comparisons of the domains of the kinase from the 
Paris strain of L. pneumophila and other prokaryotic and eukaryotic STPK, have 
revealed that Lpp2626 and Lppl439 of L. pneumophila aggregate in the group of 
eukaryotic STPK, close to the STPK originating from Entamoeba histolytica (Figure 2). 

1 5 Mycobacterium tuberculosis, which as L. pneumophila blocks the phagosome-lysosome 
fusion, produces eleven STPK of eukaryotic type 13 . In particular, the STPK PknG of M 
tuberculosis is an inhibitor of the phagosome-lysosome fusion and a promoter of 
intracellular survival 14 . The STPK domain of Lpp0267/Lpl0262 is related to the PknG 
and to the STPK YpkA of Y. pseudotuberculosis (Figure 2), an enzyme which is 

20 translocated in the eukaryotic cells where it corrects the defenses of the host by 
interfering with the transduction paths of the eukaryotic signal 15 . This suggests that the 
STPK of L. pneumophila can likewise modulate the transduction mechanisms of the 
eukaryotic signal and can modify the routing paths of the host cell. 

Twenty proteins contain ankyrin domains (Table XXIV), sequences repeated in 

2 5 tandem of around 33 amino acids which represent one of the modular interaction 
protein-protein patterns, the most current being eukaryotic. To date, the only 
prokaryotic genomes known for coding large families of proteins of the ankyrin domain 
are Coxiella burnetii and Wolbachia pipientis, which code 13 and 23 elements, 
respectively 16,17 . Similar to L. pneumophila, C. burnetii is an intracellular pathogen 

30 which is extremely well-adapted to life inside the eukaryotic phagolysosome, and W. 
pipientis is an "endosymbiont" parasite living in the reproductive cells of a large variety 
of arthropods. Therefore, the ankyrin domains can be implied in a common microbial 
mechanism for manipulating the physiology of the host cell. A possible function of the 
proteins containing the repeated ankyrin sequences of L. pneumophila is to modify the 
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interactions with the host cytoskeleton, given that it is thought that numerous eukaryotic 
ankyrin proteins are binders between the membranous proteins and the cytoskeleton 18 
and are important for targeting proteins towards the plasmic membrane or towards the 
endoplasmic reticulum. The ankyrin domains are likewise compounds of transcription 
5 regulators, suggesting that they can influence the expression of the genes of the host cell 
as proposed for Ehrlichia phagocy tophi la 19 \ In effect, one of the repeated ankyrin 
proteins of L, pneumophila (Lpp3991/Lpl0559) likewise contains a eukaryotic SET 
domain, known for binding the chromatin host and for influencing the expression of the 
genes of the host cell 20 . In opposition to the ankyrin proteins of W. pipientis, none of the 

10 repeated ankyrin proteins of L. pneumophila contains a peptide signal 17 . Instead of this, 
certain of them could be secreted by means of the secretion system of type IV, a way 
which is independent of the typical targeting signals. 

The final stages in the intracellular life cycle of L. pneumophila are to kill and 
escape from its host cell, a mechanism which is still not understood. One class of 

1 5 proteins which can affect the control of the division of the host cell (Lpp2082/Lpl2072, 
Lpp2486 and Lpp0233/Lpl 10234) separates eukaryotic F box domains, sites of protein- 
protein interactions. Generally, Lpl2072, Lpp2486 and Lpp0233/Lpl 10234) separates 
eukaryotic are associated with other interaction domains 21 . Similarly, two of the 
identified F box proteins are associated with a repeated ankyrin sequence or a double- 

2 0 spooling pattern, respectively (Table XXIV). The proteins of the F box, assembled in 
SCF ubiquitin-ligase complexes, determine which substrates are going to be targets for 
ubiquitination and subsequent proteolysis by the proteasome. Given that the targeted 
substrates comprise promoters and inhibitors of the cellular cycle as well as transduction 
compounds of the signal 22 , the F box proteins can regulate the division and cellular 

2 5 differentiation. To our knowledge, the only protein of prokaryotic F box described is 
VirF of Agrobacterium tumefaciens, a protein which, it is thought, interacts with the 
proteins of the host by means of its F box domain to target it for proteolysis 23 . Another 
pattern implied in the eukaryotic ubiquitination is the U box pattern. The protein 
Lpp2887, present in the Paris strain but not in the Lens strain, contains such a pattern. It 

30 is apparently the first recognized in a prokaryotic organism. 

Additional proteins of eukaryotic type identified in the genome of the two strains 
of L. pneumophila are sphingosine-1 -phosphate lyase (Lpp2128/Lpl2102) and two 
secreted apyrases (Lppl033, Lppl880/Lpll000, Lpll869), suggesting that L. 
pneumophila modulates the cycle of the host cell to its advantage. The broadly 
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expressed lyase sphingosine-1 -phosphate enzyme catalyses the essentially irreversible 
splitting of the sphingosine-1 -phosphate signaling molecule, a product of the 
degradation of sphingomyeline which regulates cellular proliferation and cellular death 
in the eukaryote. In effect, the overexpression of the sphingosine-1 -phosphate lyase can 
5 induce apoptosis in the eukaryote, identifying this enzyme as a double modulator of the 
metabolism of the sphingosine-1 -phosphate and of the ceramid, as well as a regulator of 
the decisions of cellular direction 24 . In addition, the sphingosine-phosphate plays a 
central role in the development of the Dictyostelium discoideum amoeba, given that 
interruption to the gene results in aberrant distribution of the actin, an abnormal 

10 morphogenetic phenotype and a viability occurring during the stationary phase 25 . The 
two apyrases of L. pneumophila are the only ones identified in the prokaryote. The 
family of apyrase protein comprises enzymes capable of splitting the tri- and 
diphosphates nucleotides (NTP and NDP) in a calcium- or magnesium-dependent 
manner. The apyrase was isolated in the autophagy 26 vacuole suggesting that these two 

1 5 proteins influence the destiny of the phagosome of L. pneumophila in diminishing the 
concentration of NTP and NDP during parasitism of the cell. 

246 proteins (7.6 %) in the Paris strain and 231 proteins (7.7 %) in the Lens 
strain have likewise been identified with double-spooling domains provided (CC), many 
of which likewise show slight similarities with eukaryotic proteins. The CC domains 

2 0 facilitate the protein-protein interactions either for the multimerisation of protein or the 
macromolecular recognition. Therefore, double-spooling domains can target proteins at 
the appropriate locale in the eukaryotic host. Interestingly, all the new substrates (SidA- 
H, SdeC) of the secretion system of type IV identified by Luo and Isberg 27 as well as 
LidA, LepA and LepB contain double-spooling domains. 

2 5 To affect the eukaryotic cell, L. pneumophila must translocate these proteins of 

eukaryotic type to the host cytoplasm. As a consequence these proteins are candidate 
substrates for the secretion system of type IV, as shown for VirF in A tumefaciens 2 * or 
RalF in L. pneumophila* 2 . 

30 Secretion system 

At the centre of the pathogenesis of L. pneumophila are the loci dot/icm, which 
together direct the assembling of secretion apparatus of type IV. Even though the two 
strains contain the complete loci dot/icm, their sequences have variations. In effect, the 
comparison of the sequence of dot/icm genes of 18 different strains of Z,. pneumophila 
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has identified a wide range of variations in sequence, and has placed the strains in seven 
phylogenetic different groups 29 . However, no correlation with the virulence is apparent. 

A novel factor of putative virulence of L. pneumophila^ limited to the Paris 
strain, is a provided auto-transporter protein. Lpp0779 contains several marks of 
5 secretion systems of type V comprising a peptide of N-terminal head for secretion 
across the internal membrane and a specialized C-terminal domain which forms a pore 
in the other membrane through which the domain clone passes to the surface of the 
cell 30 . Its domain clone is a compound of known repeated sequences of haemagglutinin 
to be implied in the cell-cell aggregation and extremely similar to those of the auto- 

10 transporter of AIDA-I and Ag43 of Escherichia coli, two proteins implied in the 
virulence. The bacterial surface protein AIDA-I facilitates adherence to the mammal 
cells 31 , while Ag43 imparts not only a low level of adhesion to certain mammal cells, 
but likewise facilitates auto-aggregation which is important for the formation of 
biological film 32 . In a similar way, the auto-transporter of L. pneumophila can be 

1 5 implied in adhesion to the host cell and the formation of biological film. In opposition 
to AID A— I and Ag43, the auto-transporter of L. pneumophila does not have an RGD 
pattern implied in the bond with the human integrins, and can thus have another 
interaction domain. The auto-transporter was acquired in the same way by horizontal 
gene transfer as suggested from its numerous IS upstream and from GC contents of 

2 0 41 % which exceed the average of the genome of 38 %. Studies of the distribution of 
this gene in clinical and environmental strains of L. pneumophila in combination with 
the study of its function must provide knowledge of its importance. 

In addition, the two strains of L. pneumophila contain a secretion method by 
translocation (Tat) with combined arginine (TatAB and TatC) and completes the 

2 5 secretion systems of type I and II. The system of type II coded by the genes IspA, IspD-J 
is required for the secretion of several enzymes such as lipase A and B (Lpp0533, 
Lppll59/Lpl0509, Lplll64), phosphatase acid Map and SurE (Lppll20, 
Lppl245/Lplll24, lpll245), lysophospholipase A . (Lpp2291/Lpl2264) and 
phospholipase PlaB (Lppl568/Lpl422), proteins which are all present in the two strains. 

30 

Metabolism 

The metabolic paths utilized by L. pneumophila for multiplying inside the 
eukaryotic cells are not known. The bacteria seems to prefer the proteic substrates, 
given that a large number of absorption and degradation systems of oligopeptide and 
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ammo add are coded in the genome. In particular, apart from the elastase homologue 
ProA of Pseudomonas aeruginosa (Lpp0532/lpl0508), three secreted paralogue 
metalloproteases as well as 46 additional peptidases are present. 

By way of opposition, systems for the absorption of sugar are rare, even though 
the complete ways of Embden-Meyerhof and Entner-Doudoroff are present. In the two 
strains of L. pneumophila no absorption system of type PTS was identified. However 
certain of the transport systems of type 55 ABC can be implied in the absorption of 
sugar, given that the bacteria has some systems for the degradation of complex sugars, 
such as trehalase, polysaccharide deacetylase, glucoamylase of type eukaryotic 
(Lpp0489/Lpl0465), ^-hexosaminidase and chitinases (Lppl 1 17/Lpll 121), The two 
strains code for proteins highly homologous to transporters of glycerol phosphate ABC 
(Lppl696, Lppl695, Lppl694/Lpll695, Lpll694, Lpll693), and for a hexose phosphate 
transporter (Lpp2623/Lpl2474), which can be important during intracellular growth. We 
have likewise identified several enzymes probably implied in the utilization of meso- 
inositol, which can interfere with the signaling of the host cell facilitated by this 
intracellular messenger. 

L. pneumophila is provided for coding for an extensive aerobic respiratory chain 
constituted by NADH deshydrogenase, cytochrome-dependent succinate 
dehydrogenase, ubiquinol-cytochrome reductase and four terminal oxydases, which 
guarantee the capacity to adapt to changing oxygen tensions (one cytochrome aa 3> two 
quinol-cytochromes of type bd and one quinol cytochrome oxidase of type o). The latter 
oxidase is absent in the Lens strain. Systems implied in anaerobic respiration are 
apparently absent in all the strains. The two genomes code for an ATP synthase of type 
F 0 F, typical of rproteobacteria, whereas the Paris strain codes for a second ATP 
synthase similar to the non-characterized systems of archeobacteria and bacteria 
marines. L. pneumophila likewise codes for at least four sodium/proton anti-carriers 
(Lppl464, Lpp2448, Lpp0868, Lpp0667/Lpll519, Lpl2304, Lpl0839, Lpl0651), which 
modulate presumably the proton and sodium gradient across the cytoplasmic membrane. 
As a consequence, a sodium motor force can be utilized for the cellular activities. In this 
respect, the presence of a compound of polar flagellar motors of type sodium, MotY, 
thus two significantly different aggregates of gene MotA-MotB, leads to the prediction 
that mobility can be activated by the sodium motor forces as well as the proton forces. 
One particular characteristic of L. pneumophila is differentiation in a mature 
intracellular form which accumulates inclusions of poly-hydroxybutyrate in the form of 
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carbon and energy reserve. As a consequence, the Paris strain codes for four paralogue 
poly-hydroxybutyrate synthases and the Lens strain codes for three paralogue poly- 
hydroxybutyrate synthases (Lpp2323, Lpp2038, Lpp2214, Lpp0650/Lpll055a-b, 
Lpl2186, Lpl0634). 

5 

Physiological adaptation and regulation of the gene 

In accord with this intracellular life style the regulatory repertoire is rather small. 
Analysis of the genome has identified 92 transcription regulators (79 in the Lens strain), 
which represent only 3.0 % of the genes provided. L. pneumophila codes for six sigma 

1 0 putative factors, the homologous of rpoD, rpoH, rpoS, rpoN, fliA and the sigma factor 
rpoE of type ECF. The number of systems with two compounds (13 histidine kinases 
and 14 response regulators) is likewise low. 

The most abundant class of regulators belongs to the GGDEF/EAL (23) family. 
Present in numerous bacteria, comprising Vibrio cholerae (41), P. aeruginosa (33), 

15 Wolinella succinogenes (26), and E. coli (19), these regulators contain two sub- 
domains, GGDEF and EAL. Of the 23 regulators identified, 10 separate only one 
GGDEF domain, 3 in the Paris strain and 2 in the Lens strain contain an EAL domain, 
and 10 in the Paris strain and 11 in the Lens strain present a combination of the two. 
The role of these regulators in L. pneumophila is unknown, but in other bacteria these 

2 0 regulators play a role in aggregation, the formation of biological film or mobility by 
muscular contraction. 

In L. pneumophila, the cyclic AMP can likewise translate the cellular signals 
given that the genome codes for five adenylate cyclases of class III (Lppl446, Lppl 131, 
Lppl704, Lppl277, Lpp0730/Lpll538, Lplll35, Lpll703, Lpll276, Lpl0710). In P. 

2 5 aeruginosa, CyaB, an adenylate cyclase of class III, is implied in the regulation of the 

virulence of gene 33. However, L. pneumophila does not contain the orthologue of Vfr, 
the dependent AMPc regulator of P. aeruginosa, but it codes for five proteins with 
AMPc bonding patterns (Lpp3069, Lppl482, Lpp2063, Lpp0611, Lpp2777/Lpl2926, 
LpllSOl, Lpl2053, Lpl0592a-b-c, Lpl2648). As for P. aeruginosa, these adenylate 

3 0 cyclases of class III can comprise environmental signals extending from the nutritional 

content of the surrounding medium to the presence of host cells and can control the 
expression of the virulence of the gene in consequence. 
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Heightened plasticity of the genome of L . rmeumn philn 

The two genomes exhibit an astonishingly high plasticity and a diversity of the 
genome. Comparison of the chromosomes has identified a preserved skeleton of 2664 
genes but 280 and 428 genes (10 and 14 %) are specific to the strain (Figure 3). Given 
that the two strains analyzed belong to the same species and the same serogroup, this 
diversity was unexpected. For example the comparison of the genomes of two strains of 
Salmonella enterica of serotype Typhi identifies only 2 % for each of the genes specific 
to the strain 34 . The specific genetic equipment of the Paris strain contains a certain 
number of regulators (three homologous of CsrA, 13 transcription regulators), of 
additional repeated ankyrin sequences and proteins of type eukaryotic (Table XXIII and 
Table XXIV) as well as several restriction modification genes (modification methylases 
of the DNA, endonucleases), which can explain the low competence (personal 
observation) and the high genomic stability 8 of the Paris strain. The Lens strain contains 
fewer specific regulators (4) and four specific proteins with eukaryotic domains (Table 
XXIII), two of which are repeated sequences of ankyrin proteins, suggesting that the 
Paris strain is a particularly well-equipped strain. 

The genomes of L. pneumophila have undergone rearrangements of multiple 
genomes. The important synteny in the genome between the Paris and Lens strain is 
interrupted by inversion of 260 kb, insertion of 130 kb in the Paris strain (or deletion in 
the Lens strain) and by deletions and smaller multiple insertions. The fragment of 
130 kb is flanked by an ARnt gene and codes for a putative integrase, suggesting a 
structure similar to the islets of pathogenicity of the enterobacteria. It contains an ATP 
synthase, chemiosmotic flow systems (cebABC, cecABC) and the genes cadAl, ctpA, 
copAl, copA2 coding for ATP-dependent flow pumps, as was proven to be induced in 
the macrophages 35 and separate the prpA-lvrABC gene aggregate, present within a 
pathogenicity islet of 65 kb in the Philadelphia strain 36 With the exception of 
abovementioned genes, this pathogenicity islet is absent from the Paris strain and from 
the Lens strain. However, the corresponding chromosomal site in the Paris strain is the 
insertion site of an integrative plasmid discussed below. Therefore, these two regions 
can be hot points for genomic rearrangements. Genomic variation is likewise evident 
from its network of mobile elements comprising ten integrases, 58 insertion sequences 
(34 complete and 24 truncated) thus as proteins relevant to phages. In addition, the 
genomes contain a large number of repeated sequences organized in the form of 
repeated inverse sequences, which recall the ERIC sequences of the enterobacteria. 
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These LeRIC (Repeated Intergenic Consensus of Legionella) fall into 7 classes present 
in numerous copies (for example 80, 18, 18, 25, 9, 9 and 6 in the Paris strain). 

L. pneumophila Paris and Lens contain lvh, a region which codes for a second 
secretion system of type IV previously characterized in the Philadelphia strain 37 . One 
5 interesting observation is that the lvh region of L. pneumophila Paris is coded on a 
region of 36 kb which exists either integrated into the chromosome or excised in the 
form of a multicopy plasmid (unpublished data). This pattern is similar to that described 
for the unstable element of 30 kb of the Olda strain, which is possibly phage-derived 
and is implied in the phase variation 38 . The GC content of the lvh region (43 %) is 

10 different to that of the remainder of the chromosome (38 %) and it contains certain 
genes related to phages, suggesting possible phagic origin. However, the exact excision 
and integration mode is still not understood. An attractive hypothesis is that the 
integration and excision of particular regions of the chromosome is a mechanism 
specific to L. pneumophila for boosting versatility. 

15 The second plasmid of the Paris strain (132 kb) comprises known virulence 

factors, mobile genetic elements and genes of antibiotic resistance. The regulator system 
with two lrpR-lskS compounds present on this plasmid was found on a plasmid of 
Legionella longbeachae implied in the virulence of these species 39 . Heightened 
preservation (93 to 98 % of protein identity) of the six gene sequences on the plasmid of 

20 135 kb along L. longbeachae can indicate a recent horizontal transfer between L. 
pneumophila and L. longbeachae, L. pneumophila Lens contains a plasmid of 60 kb 
which codes for several proteins homologous to the transfer region of the F plasmid of 
E. coli. All three plasmids of the Paris and Lens strain code for a paralogue of CsrA, a 
repressor of the transmission traits and activator of replication 40 . 

2 5 Although the role of the plasmids in the virulence of L. pneumophila has yet to 

be determined, the correlation between strains containing a plasmid and the virulence in 
a mouse model was described 41 . In addition, L. pneumophila strains with plasmids seem 
to persist longer in the environment than those strains not having plasmids 42 . The 
identification in the clinical isolates of the plasmids coding for factors of putative 

3 0 virulence is another indication of the importance of the plasmids for the pathogenicity 

of Legionella. 

The genomes of L. pneumophila display a plasticity likewise at the gene level. 
The loci enh, implied in the entry in the host cells 43 , are present in the Paris and Lens 
strains. One of the proteins coded by these loci is RtxA, which contributes to entry, 
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adherence, cytotoxicity, pore formation 43 and intracellular routing in the amoeba 44 . 
Unlike the AA100 10 strain in the Paris strain, rtxA is fused with arpB and a second 
protein with about 30 sequences repeated in tandem extremely preserved de 549 bp. A 
similar structure is coded by the Lens strain, however we have identified two patterns 
5 within the repeated region, the two being different to that of the Paris strain. However, 
the number of repetitions seems to be the same (Figure 4). It is possible that the 
variations in number and sequence of the repeated sequences contribute to the 
multiplicity and likewise to the virulence of L. pneumophila. 

In accordance with the relative plasticity in their genome, the strains of L. 

10 pneumophila code for organelles of type IV bacterial pili, which are required for natural 
competence for the transformation of the DNA 45 . The organization of the genes coding 
for type IV bacterial pili is similar to that of P. aeruginosa where they are crucial for 
bacterial adherence and colonization of the mucosal surfaces and for mobility by 
muscular contraction. Another mechanism in L. pneumophila contributing to the 

1 5 plasticity of the genome is conjugated transfer facilitated by the secretion of type IV of 
plasmids 46 and chromosomal DNA 47 . 

Conclusion 

Analysis of the sequences of the genome of the clinical of L. pneumophila Paris 
2 0 and Lens strains and its comparison identifies L. pneumophila as an extremely versatile 
organism, which demonstrates a plasticity and an extensive genomic diversity. The 
excision and integration of plasmids or genes can be a mechanism which L. 
pneumophila exploits for adapting to different environments. Its large cohort of proteins 
of eukaryotic type is provided for manipulating the host cell to the advantage of the 
2 5 pathogen (Figure 5). Eucaryotic proteins of putative origin have likewise been identified 
in other intracellular pathogens, comprising Coxiella, Wolinella, Agrobacterium, 
Mycobacterium and Ehrlichia, but currently L. pneumophila sequences them as 
prokaryotic with the greatest variety of proteins of eukaryotic type. Presumably, during 
its co-evolution with free living amoeba, the L. pneumophila pathogen acquires DNA by 
30 horizontal transfer from its host or by convergent evolution. These proteins can then 
likewise contribute to the infection of human macrophages. By being based on the 
genomic sequences future comparative and functional studies are going to enable 
survival tactics of intracellular parasites to be defined, and to identify the special 
attributes of endemic and epidemic L. pneumophila. To combat the menace of L. 
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pneumophila resistant to chemical products widely utilized for decontamining public 
water systems, comprising hospitals, the genomic sequence can stimulate the 
identification of targets for novel active biocides against L. pneumophila. 



Methods 



30 



Preparation and sequencing of the DNA. Paris and Lens strains of L. 
pneumophila were cultivated on agar BCYE at 37°C over 3 days and the chromosomal 
DNA was isolated by utilizing standard protocols. The cloning, sequencing and 
assembling were completed as described previously 48 . For the two genomes two 
libraries (inserts of 1-2 kb and 2-3 kb) were generated by random mechanical chiseling 
of the genomic DNA and cloning in pcDNA-2,1 (Invitrogen). A hook was obtained by 
terminal sequencing of clones from a BAC library constructed as described previously 49 
by utilizing plndigoBac (Epicentre) as vector. For the Paris strain of L. pneumophila an 
insert library of average size (8-10 kb) was constructed in the low-number vector of 
copies pSYX34. The purification Plasmide DNA was produced either with Montage 
Plasmide Miniprep96 Kit (Millipore) or by utilizing the DNA sequencing matrix 
amplification kit TempliPhi (Amersham Biosciences). The sequencing reactions were 
created by utilizing the sequencing reactions kit ABI PRISM BigDye Terminator and an 
analyzer 3700 or a 3730 XI Genetique Analyzer (Applied Biosystems). 47,200 
sequences for the Paris strain of L. pneumophila, and 47,231 sequences for the Lens 
strain each originating from four libraries were obtained and assembled and finished as 
described previously 48. 

Annotation and analysis. The definition and annotation of the coding sequences 
(CDS) is as was described previously 48 by utilizing the Boite CAAT 50 software. All 
the CDS provided were checked visually. The predictions on function were based on 
preferred BLASTp similarity and on analysis of patterns by utilizing the PFAM, Prosite 
and SMART databases. We have identified orthologous genes by better BDERNIERE 
reciprocal correlation and FASTA comparisons. For identification of the double- 
spooling domains the publicly available software PairCoil and Coilscan were utilized. 
The pseudogenes have one or more mutations which prevent complete translation. 
Repetitive sequences of DNA were identified by BLASTN comparisons of the 
intergenic regions and of the complete genome. MFOLD software was utilized to 
predict the folding of the single sheet of DNA molecules. 
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URL. The sequence and annotation of the two genomes of L. pneumophila are at 
http://genolist.pasteur.fr/LegioList. For annotation and analysis we used PairCoil 
http://paircoil.lcs.mit.edu/cgi-bin/paircoil and Coilscan 

http://www.biologv.w ustl.edu/gcg /coi1scan html 
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Table XXII : General characteristics of the two Legionella pneumophila genomes 





L. pneumophila Paris 


L. pneumophila Lens 


Size of the chromosome (bp) 


3 503 610 


3 345 687 


G+Ct content 


38.3% (37.4%) 


38.4% (38.4%) 


G+C protein-coding genes content 


39.1% (37.9%) 


39.4%(39.1%) 


Total number of protein-coding genes 


3076(141) 


2931 (57) 


Average length (codons) of 
protein-coding genes 


331 


333 


Number of operons rRNA (16S-23S-5S) 


3 


3 


Number of genes tRNA 


43 


43 


Coding in percentage 


87.9% (92%) 


88% (83.7%) 


Plasmid 


1 ( 131.9 kb) 


1 ( 59.8 kb) 


Number of genes specific to the line 


428 (125) 


280 (44) 


Number of orthologous genes 


2664 


2664 



Table XXIII: Proteins having the greatest similarity with eukaryotic proteins 



L. pneumophila 
Paris 



Produit provided 



Z,./?.Lens G-C 



Percentage of protein identity 



lppl647 

lpp0702 

lpp0321 

lppll57 

lppl522 

lpp2832 

plppOOSO 

l PP 0634 

lp P 0965 

lpp2748 



purC 
exoA 

exodeoxyribonuclease III 
Precursor protein bond to 
RNA 

pyruvate decarboxylase 

protein de biosynthesis de 
thiamine NMT-1 
nuoE NADH 
dehydrogenase I chain E 
protein with hypersensitive 
induced response 

hypothetical protein 
protease 

phytanoyl-coA dioxygenase 

SS^^^^^l!- pn< >sp h a 



lpll640 


38% 


lpl0684 


39% 




34% 


lplll62 


39% 


\pll461 


38% 


lpl2701 


38% 




36% 


lpl0618 


39% 


lpl0935 


39% 


lpl2621 


36% 


Ip!2J02 





61% over the entire length (AAR06292.1 
Nicotania tabacum) 

58% over the entire length (EAA20230.1 
Plasmodium yoelii yoelii) 
45% of 50% of the protein (AAL07519 
Solarium tubeosum) 

50% over the entire length (AAB16855.1 

Arabidopsis thaliana) 

49% over the entire length (AAC64375.1 
Botryotinia fuckeliana) 

49% of 82% of the protein (BAA25988.1 
Homo sapiens) 

48% over the entire length (AAN1 7462.1 
Hordeum vulgare subsp. Vulgare) 
48% over the entire length (XP_306643.1 
Anopheles gambiae) 

45% over the entire length (NP_1 8943 1.2 

Arabidopsis thaliana) 

44% over the entire length (XP_3 72 144.1 

"""US 
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lpp0489 
lpp0955 
l PP 0578 
lpp0379 

lpp!033 

lp P 2923 
lpp3071 
lpp2134 

l PP 1880 

lpp2747 
lpp2468 
lppl824 
lp P 1665 
lppl959 
lpp0358 
lppll27 
lp P 1167 



glucoamylase 
cytokinin oxydase 
phytanoyl coA dioxygenase 

hypothetical protein 

ectonucleoside triphosphate 

diphosphohydrolase 

(apyrase) 

6-pyruvoyl-tetrahydropterin 
synthase 

zinc metalloproteinase 

methyltransferase bonded to 
SAM 

ectonucleoside triphosphate 

diphosphohydrolase 

(apyrase) 

methyltransferase bonded to 
SAM 

Cytochrome P450 

protein bound to the nuclear 
membrane 

uracyl DNA glycosylase 

condensation chromosome 
type 1 

hypothetical protein 
Ca2+-ATPase de transport 
uridine kinase 



lpl0465 
lpl0925 
lpl0554 
Lpl0354 



39% 
39% 
36% 
39% 



IpllOOO 40% 



lpl2777 
lpl2927 
lpl2109 



34% 
38% 
35% 



lpll869 39% 



lpl2620 
lpl2326 

lpll659 
lpll953 
lpl334 
lplll31 
lplll73 
lpl2481 
% 



35% 
39% 
34% 
36% 
41% 
38% 
37% 
33% 
32% 



32% over the entire length (P42042 Arxula 
adeninivorans) 

32% over the entire length (NP 484368.1 
Nostoc sp.) 

31% over the entire length (EAA70 100.1 
Gibberella zeae) 

31% over the entire length (CAD2 1525.1 
Taenia solium) 

25% over the entire length - nucleoside 
phosphatase signature (Q9MYU4 Sus scrofa) 

26% over the entire length (NPJ703938.1 
Plasmodium falciparum) 

38% over the entire length (AAF56122.1 
Drosophila melanogaster) 

24% over the entire length (BAC98835.1 
Bombyx mori) 

26% over the entire length (CAE70887.1 
Caenorhabditis briggsae) 

33% of 56% of the protein (EAA20288.1 
Plasmodium yoelii yoelii) 

20% of 75% of the protein (NP_487786J 
Nostoc sp.) 

19% of 40% of the protein (NP_082559.1 Mus 
musculus) 

21% of 80% of the protein (EAA36774.1 
Giardia lamblia) 

Model of preserved chromosome condensation 
regulator 

37% on 53% of the protein (EAA20288.1 
Plasmodium yoelii yoelii) 

22% on 34% of the protein (AAB8 1284.1 
Paramecium tetraurelia) 
35% on 65% of the protein (AAM09314.2 
Dictyostelium discoideum) 




Lpp indicates the coding sequences (CDS) provided from L. pneumophilastr^m Paris; Ipl 
indicates the coding sequences (CDS) provided from L. pneumophilastrain Lens, the lines in 
gray indicate the proteins which are likewise mentioned in Table 2B in terms of their preserved 
eukaryotic domains; the access numbers to the proteins are indicated between parentheses 
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Table XXIV : coding domains of L. pneumophila preferably protein found in eukaryotic 
proteins 



L. pneumophila 


: Z,. pneumophila 


identified unit 


Content 






putative function 


Paris 


Lens 




G-C 


ennC \lpp2o92) 


e>?/*C (lpl2564) 


21 domains sel-1 


39% 




iiaL(lppl 1 74)- 


//a/L (lplll80) - 


6 domains sel-1 






FnhC naralop 


FnhC naralop 1 


jO/o 




lppl310- EnhC 


lpll307- EnhC 








paralog 


paralog 


4 domains sel-1 


41% 


Invasion and traffic in 


lpp2174- EnhC 


(p//30J - EnhC 






host cells 


paralog 


paralog 

/p/70J9-EnhC 


3 domains sel-1 


40% 




_ 


paralog 


7 domains sel-1 


A CO/ 

45% 




ralF(lpp!932) 


ralFlpll919 


domain sec7 


34% 




Ippvz 0 / 


lplU2o2 


domain ser/thr kinase protein 


38% 




lpp2626 


lpl2481 


domain ser/thr kinase protein 


32% 




lppl439 


lpll545 


domain ser/thr kinase protein 


36% 




Ipp20o5 


lp2055 


ankyrin repetition 


37% 




lppuU37 


lpl(J03o 


ankyrin repetition 


38% 




nlnr>0098 




ankyrin repetition 


5 1/0 




lpp2058 


lpl2048 


ankvrin renetition 


38% 




lpp0750 


lpl0732 


ankvrin renetition 


35% 




lpp2061 


lpl2051 


ankyrin repetition 


39% 




lpp2270 


lpl2242 


ankyrin repetition 


34% 




lpp0503 


lpl0479 


ankvrin renetition 


38% 




lppl905 




ankyrin repetition 


35% 


Modulation functions of 
host cells 


ippi ooj 


ipii OO/ 


anKynn repeiinon ^ aomam i!>xi i 


d5 /o 


lpp2248 


lp\2219 


ankyrin repetition 


39% 


lpp0202 




ankyrin repetition 


38% 




lpp0469 


lpl0445 


ankyrin repetition 


38% 




lpp251 7 


lpl2370 


ankyrin repetition 


36% 




Inn J 100 




diiKyrm repeuuon 


H-o /o 




lpp0126 


IplOlll 


ankyrin repetition 


39% 




lpp0356 




ankyrin repetition 


38% 




lpp2522 


lpl2375 


ankyrin repetition 


39% 




lpp0547 


lpl0523 


ankyrin repetition 


40% 






lpll681 


ankyrin repetition 


34% 






lpl2344 


ankyrin repetition 


35% 






lpl2058 


ankyrin repetition 


40% 




lpp2082 


lpl2072 


domain F-box + ankyrin repetition 


36% 




lp P 2486 




domain F-box + superhelice 


34% 




lpp0233 


lpl0234 


domain F-box 


39% 


Control division, evasion 


lpp2887 




2 domains U-box 


35% 


host cells 


lp P 2128 


lpp2102 








Sphingosine-1- 


Sphingosine-1- 








phosphate lyase 


phosphate lyase 




41% 
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Lpp indicates the coding sequences (CDS) provided from L. pneumophilastrain Paris; 
Ipl indicates the coding sequences (CDS) provided from L. pneumophilastrain Lens; the 
numbers indicate the number of domains identified in a protein; ser/thr indicates 
threonine serin 

5 

Example 9: Repeated sequences of DNA and secreted enzymes characteristic of 
Legionella pneumophila 

A) Repeated sequence 

A repeated sequence was identified in the genome of the Legionella 

1 0 pneumophila Paris strain then sequenced completely. This sequence (SEQ ID 7074) is 
of 122 bp and is repeated 86 times in the genome of the L. pneumophila Paris strain. 
The preservation is from 81 to 100 % (0 to 19 mismatch) over the entire length for 53 
copies and from 70 to 80 % over a length of at least 100 nucleotides for 33 copies. In 
the L. pneumophila Lens strain, there are 62 copies whereof the preservation of 29 

15 copies is from 81 % to 95 % over the entire length. We have determined 
oligonucleotides specific to this sequence for its amplification by PCR. Tests on 15 
strains each belonging to one of the serotypes of L. pneumophila (serotypes 1 to 1 5) 
have shown that this sequence is present in all serotypes of L. pneumophila. The test of 
1 1 strains belonging to other species of the Legionella genre (L. miedadei, L. dumoffii, 

2 0 L. gormanii, L. longbeachae serogroup 1, L. jordanis, L. anisa, L. erythre, L. 
rubriluccns, L. quinlivani, L. moravica, L. taurinensis) with these oligonucleotides, thus 
databank research have shown that this sequence is specific to the Legionella 
pneumophila species and that it will be able to thus serve as an identifier of the species 
(see Tables XXV and XXVI and Figures 6 and 7). The high number of copies of this 

2 5 sequence in the genome will enable amelioration of the sensitivity of a PCR test or by 

hybridization, compared to a present sample in a single copy. 
SEP ID 7074 : 

AGGACTTACGAAAAACCCCAAGATCAAGGCAAAAAATGTTTTTAATGAGG 
GAGTTTAGATAAACTAAATAACCGAATTAAAA 

3 0 ATTGGGATTTTTCGTAAGTCCT. 

This sequence is an interesting target for diagnostics of L. pneumophila diagnostic by 
PCR or by other methods equivalent to PCR. 

Among the primers used for amplification and detection of this repeated sequence, the 
following couple of primers can especially be cited: 
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SEQ ID N° 7075: GAAAAACCCCAAGATCAAGGC and 
SEQ ID N° 7076: AGGACTTACGAAAAACCCCAA. 

Table XXV : List of the DNA of the 15 reference serogroups of Legionella pneumophila 



Name 


Number 
labo 


n°ATrr 


L pneumophila sgl 
— * ' & 


Rl 


A1LL 33152 


L pneumophila sg3 


R2 


ATCC 33155 


L pneumophila sg2 Togus 1 


R6 


ATCC 33154 


L pneumophila sg4 LosAngelesl 


R7 


ATCC 33156 


L pneumophila sg5 subsp fraseri 


R8 


ATCC 33216 


L pneumophila sg6 Chicago 2 


R10 


ATCC 33215 


L pneumophila sg7 Chicago 8 


R12 


ATCC 33283 


L pneumophila sg8 Concord 3 


R13 


ATCC 33096 


L pneumophila sg9 IN-23-G1-E2 


R14 


ATCC 35289 


L pneumophila sglO Leiden 1 


R15 


ATCC 43283 


L pneumophila sgll 797-PA-H 


R18 


ATCC 43130 


L pneumophila sgl2 570-CO-H 


R19 


ATCC 43290 


L pneumophila sgl3 82A3105 


R20 


ATCC 43736 


L pneumophila sgl4 1 169-MN-H 


R21 


ATCC 43073 


L pneumophila sgl 5 Lansing3 


R22 


ATCC 35251 
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Table XXVI : list of the DNA of the 1 1 reference species Legionella non pneumophila 



ATCC 
Number 


| Description 


Strain Reference 


33218 


J Tatlockia micdadei Garrity et ah 
j ucpusiieu a,s Xjegioiieiia nucGaaei 


TATLOCK [CIP 103882; 
NCTC 11371] 


33279 


x »«(// luu^ict uutnujjii \i3renner et ai.) 
Brown a/, deposited as Legionella 
dumoffii 


NY 23 


33297 


Fluoribacter gormanii (Morris et al.) 
Brown et al. deposited as Legionella 
gormanii 


LS-13 [ALL03] 


33462 


Legionella longbeachae McKinney et al. 
serogroup 1 


Long Beach 4 [NCTC 
11477] 


33623 


Legionella jordanis Cherry et al. 


BL-540 


35291 


Legionella anisa Gorman et al 


CH-47-C3 


35301 


■Legionella erytnra rsrenner et al. 


SE-32A-C8 [NCTC 11977] 


35304 


Legionella rubrilucens Brenner et aL 


WA-270A-C2 [NCTC 
11987] 




43830 


Legionella quinlivanii Benson et aL 
serogroup l 


1442-AUS-E [CIP 105272] 


; 


43877 


Legionella moravica Wilkinson et al 


316-86 [CDC 1634-CZK-E; 
CIP 103883] 


700508 


Legionella taurinensis Lo Presti et al 


Turin I no 1 



B) Enzvmes secreted 

5 The enzymes secreted and common to the three strains of L. pneumophila (Paris, 

Lens and Philadelphia) whereof the sequences are identified hereinbelow can be utilized 
especially as a target in colorimetric tests (or for their being made available) for 
detection of the presence or not of Legionella in a biological sample. 

- the sequence lpp0489 (SEQ ID 4292) which codes for a precursor of glucoamylase 
1 0 (Glucan 1 ,4-alpha-glucosidase) of eukaryotic cell without homologue in the bacteria; 

- the sequence lpplll7 (SEQ ID 6477) which codes for a potential secreted chitinase; 
and 
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15 



- the sequences lppl033 (SEQ ID 4267) and lppl880 (SEQ ID 3675) which code for a 
protein similar to an ectonucleoside triphosphate diphosphohydrolase (apyrase) secreted 
from a eukaryotic cell. 
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Table I : Correspondence of the numbers attributed to the genes of the Paris strain on its 
contigs with the numbers of SEQ ID identified in the list of sequences and position of 
nucleic sequences coding these genes on the sequence of these contigs with their 
putative function, as well as their specificity relative to the Philadelphia strain 

5 

Table II « Best-BlastP »: Putative function of the ORFs identified for the Paris strain 

Table III : Correspondence of the numbers attributed to the contigs with the numbers of 
the SEQ ID identified in the list of sequences 

10 



Vw^UlIll t£ 1 


oJj/V^ lAJ INO. 1 


i^onngzy 


oxiV^ iXJ INO. ZV 


f^rmti cr9 


oCv^ \XJ 1NO. Z 


^oniigju 


olil^ ID JNO. 


Prmticr'3 


oCy WJ INO. 3 


^oniigj i 


oxiv^ ID INO. J 1 


(~*r\r\ti crA. 


OPO TFi Mo A 
Olj/V^ lU JNO. H 


i^ontigjz 


orSv^J ID JNO. oZ 


v^UIlLlgJ 


oEy 1U INO. D 


LxOnug j d 


oJH^J ID JNO. dd 


Contie6 


SEO ID No 6 


Contia^4 


SFO TD No 14 


Contig7 


SEQ ID No. 7 


Contig35 


SEQ ID No. 35 


Contig8 


SEQ ID No. 8 


Contig36 


SEQ ID No. 36 


Contig9 


SEQ ID No. 9 


Contig37 


SEQ ID No. 37 


ContiglO 


SEQ ID No. 10 


Contig38 


SEQ ID No. 38 


Contigl 1 


SEQ ID No. 1 1 


Contig39 


SEQ ID No. 39 


Contigl2 


SEQ ID No. 12 


Contig40 


SEQ ID No. 40 


Contigl 3 


SEQ ID No. 13 


Contig41 


SEQ ID No. 41 


Contigl 4 


SEQ ID No. 14 


Contig42 


SEQ ID No. 42 


Contigl 5 


SEQ ID No. 15 


Contig43 


SEQ ID No. 43 


Contigl 6 


SEQ ID No. 16 


Contig44 


SEQ ID No. 44 


Contigl 7 


SEQ ID No. 17 


Contig45 


SEQ ID No. 45 


Contigl 8 


SEQ ID No. 18 


Contig46 


SEQ ID No. 46 


Contigl 9 


SEQ ID No. 19 


Contig47 


SEQ ID No. 47 


Contig20 


SEQ ID No. 20 


Contig48 


SEQ ID No. 48 


Contig21 


SEQ ID No. 21 


Contig49 


SEQ ID No. 49 


Contig22 


SEQ ID No. 22 


ContigSO 


SEQ ID No. 50 


Contig23 


SEQ ID No. 23 


ContigSl 


SEQ ID No. 51 


Contig24 


SEQ ID No. 24 


Contig52 


SEQ ID No. 52 


Contig25 


SEQ ID No. 25 


Contig53 


SEQ ID No. 53 


Contig26 


SEQ ID No. 26 


Contig54 


SEQ ID No. 54 1 


Contig27 


SEQ ID No. 27 


Contig55 


SEQ ID No. 55 


Contig28 


SEQ ID No. 28 


Contig56 


SEQ ID No. 56 



WO 2005/049642 



PCT/IB2004/003578 



113 



Table IV : Correspondence between the contigs of Legionella pneumophila Philadelphia 
strain and numbering of their sequence in the list of sequences 



\^LP1IIIJ* Ui lilt; i Ull«lUdJJIll«t all aJJI 


oc(| la 


Contigl 


SEQ ID N°3456 


Contig2 


SEQ ID N°3457 


Contig3 


SEQ ID N°3458 


Contig4 


SEQ ID N°3459 


Contig5 


SEQ ID N°3460 


Contig6 


SEQIDN 0 3461 


Contig7 


SEQ ID N°3462 


Contig8 


SEQ ID N°3463 


Contig9 


SEQ ID N°3464 


Contigl 0 


SEQ ID N°3465 


Contigl 1 


SEQ ID N°3466 


Contigl 2 


SEQ ID N°3467 


Contigl 3 


SEQ ID N°3468 


Contigl 4 


SEQ ID N°3469 


Contigl 5 


SEQ ID N°3470 


Contigl 6 


SEQ ID N°3471 


Contigl 7 


SEQ ID N°3472 


Contigl 8 


SEQ ID N°3473 


Contigl 9 


SEQ ID N°3474 


Contig20 


SEQ ID N°3475 


Contig21 


SEQ ID N°3476 


Contig22 


SEQ ID N°3477 


Contig23 


SEQ ID N°3478 


Contig24 


SEQ ID N°3479 


Contig25 


SEQ ID N°3480 


Contig26 


SEQIDN°3481 


Contig27 


SEQ ID N°3482 


Contig28 


SEQ ID N°3483 


Contig29 


SEQ ID N°3484 


Contig30 


SEQ ID N°3485 


Contig3 1 


SEQ ID N°3486 


Contig32 


SEQ ID N°3487 


Contig33 


SEQ ID N°3488 


Contig34 


SEQ ID N°3489 


Contig35 


SEQ ID N°3490 


Contig36 


SEQ ED N°3491 


Contig37 


SEQ ID N°3492 


Contig38 


SEQ ID N°3493 


Contig39 


SEQ ID N°3494 


Contig40 


SEQ ID N°3495 


Contig41 


SEQ ID N°3496 


Contig42 


SEQ ID N°3497 


Contig43 


SEQ ID N°3498 


Contig44 


SEQ ED N°3499 
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Contig45 SEQ ID N°3500 

Contig46 SEQ ID N°3501 

Contig47 SEQ ID N°3502 

Contig48 SEQ ID N°3503 

Contig49 SEQ ID N°3504 

Contig50 SEQ ID N°3505 

Contig5 1 SEQ ID N°3506 

Table V : Surface proteins of Legionella pneumophila Paris strain 

The proteins of surfaces specific to the Paris strain are indicated in bold. 



SEQ ID 


IPF 


Annotation/simiarity to other proteins 








3410 


94.7 


Surface protein, adhesion protein of Streptococcus sp. 
and Pseudomonas sp., Rtx toxin 


704 


202.3 


Surface antigen of Bordetella sp. and Coxiella burnetii 


746 


209.2 


Surface protein of Wolbachia sp. 


2267 


440.4 


Surface protein of Mycoplasma hominis, Streptococcus 


2751 


514.5 


Protein similar to 440.4 


3192 


627.1 


Surface protein of Streptococcus pyogenes (« collagen-like ») 


3218 


663.2 


Lipopolysaccharide biosynthesis, O-antigen acetylase Pseudomonas 
aeruginosa 


3221 


667.3 


Surface antigen of Trypanosoma cruzi 


3222 


668.4 


IcmE of Legionella pneumophila 


3317 


803.3 


Flagellar protein (« L-nng protein ») 


3324 


817.7 


Surface protein of Mycoplasma hominis 


136 


1115.4 


Rtx toxin of Magnetococcus sp., putative lipoprotein of Leptospira 
kirschneri 


171 


1171.3 


Protein of « surface exclusion » type of Salmonella typhimurium 


310 


1391.1 


Transporter of protons of Coxiella burnetii 


337 


1429.4 


Rtx toxin, surface protein of Bacillus cereus 


481 


1653.3 


Activator of plasminogene of Yersinia pestis, protease associated 
with the cellular envelope 


527 


1724.3 


Hydrolase of the cellular envelope of Pseudomonas putida 


652 


1910.6 


Protein similar to 440.4 


664 


1933.4 


Protein similar to 440.4 


893 


2343.2 


Surface antigen of Rickettsia sp. 


972 


2448.3 


Hypothetical protein of Coxiella burnetii^ periplasmic protein 


1148 


2727.2 


Surface protein of Spirochete 


1298 


2968.3 


FimV, assembly of pili of Legionella pneumophila 


1361 


3059.2 


Immunogene protein of Legionella pneumophila 


1503 


3271.3 


O-antigen acetylase of Pseudomnas aeruginosa 


1521 


3299.3 


Agglutinine, adhesine, surface protein of Brucella melitensis 


1576 


3374.1 


Surface antigen of Magnetospirillum magnetotacticum 


1651 


3496.2 


Glycoprotein rich in histidine of Plasmodium lophurae 


1755 


3636.1 


Lipoprotein pal of Legionella pneumophila 


1847 


3780.2 


Surface protein of Plasmodium falciparum 
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1877 


3827.2 


O-antigen acetylase of Pseudomnas aeruginosa 


2224 


4347 7 


Protein of the external membrane of Burkholderia fungorum, surface 
antigen of Rickettsia sp. 


2406 


4608.1 


Antigen of erythrocyte infected by Plasmodium 


2843 


5349.3 


Major surface protein of Anaplasma marginale, 
hypothetical protein of Plasmodium falciparum 


2930 


5526.2 


Adhesine, virulence protein of Escherichia coli 


3037 


57393 


Protein of « surface exclusion » type of Pseudomonas putida, 
Enterococcus faecalis, Rtx toxin 


3139 


6037.1 


Surface antigen of Entamoeba histolytica and Plasmodium falciparum 


3157 


6079.1 


Surface antigen of Trypanosoma cruzi 


3165 


6097.1 


RtxA protein of Legionella pneumophila 


3181 


6131.1 


Adhesine/ surface protein of Pseudomonas putida, 
Enterococcus faecalis, Rtx toxin 



Table VI ; Proteins implied in biosynthesis of polysaccharides having a cellular 
envelope of Legionella pneumophila 



SEQ ID 


IPF 


Annotation/similarity with other proteins 


1126 


269.1 


Heptosyl transferase, biosynthesis of lipopolysaccharides in Coxiella 
burnetii 


3218 


663.2 


O-acetyl transferase, modification of lipopolysaccharides in 
Vibrio cholerae 


288 


1360.6 


Protein implied in biosynthesis of lipopolysaccharides in 
Methanosarcina 


632 


1882.2 


Polysaccharide deacetylase of Coxiella burnetii 


917 


2371.1 


Proteine CapM of Rickettsia conorii, glycosyltransferase 


1503 


3271.3 


Acetylase of antigen O of Pseudomonas aeruginosa, modification 
of lipopolysaccharides 


1555 


3348.2 


Predicted xylanase/chitine deacetylase of Cytophaga hutchinsonii 


1877 


3827.2 


Acetylase of antigen O of Pseudomonas aeruginosa, 
modification of lipopolysaccharides 


1928 


3923.2 


Potential epimerase of nucleoside-diphosphate-sugar of 
Thermoanaerobacter tengcongensis 


1963 


3980.1 


Phosphopantetheine adenylyltransferase of Ralstonia metallidurans, 
biosynthesis of lipopolysaccharides 


2204 


4323.1 


Pyrophosphorylase of nucleoside-diphosphate-sugar 


2212 


4334.1 


Polysaccharide deacetylase, xylanase/chitin deacetylase 


2243 


4371.1 


Polysaccharide deacetylase, xylanase/chitin deacetylase 


2324 


4488.1 


Aminotransferase, synthesis of lipopolysaccharides 


2378 


4567.2 


WciT of Streptococcus pneumoniae, biosynthesis of polysaccharides 


2410 


4616.2 


Biosynthesis of antigen O, hypothetical protein of Coxiella burnetii 


2411 


4618.1 


Biosynthesis of lipopolysaccharides, glycosyltransferase 



5 
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Table X : Correspondence of the numbers attributed to the contigs of L. pneumophila 
Philadelphia with the numbers of the SEQ ID identified in the list of sequences 



COnilgl 


i>r,Q ID Noo/Ool 


Contig2 


SEQ ED Nod7062 


Contig3 


SEQIDNoo7063 


Contig4 


SEQIDNoo7064 


Contig5 


SEQIDNoo7065 


Contig6 


SEQIDNoo7066 


Contig7 


SEQ ID Noo7067 


Contig8 


SEQIDNoo7068 


Contig9 


SEQ ID Noo7069 


ContiglO 


SEQ IDNoo7070 


Contigl 1 


SEQIDNoo7071 


Contigl2 


SEQIDNoo7072 


Contigl 3 


SEQ ID Noo7073 



5 Table XI: List of the sequences of L. pneumophila Philadelphia identified as specific to 
this strain relative to the Paris and Lens strains and position of these sequences on the 
contigs 



Indication on the specifics of the Philadelphia strain 



IPF Lp Philadelphia 


Contig 


SEQ ID 


Position 1 


Position2 


1007.1 




CONTIG9 


SEQ ID N°7069 


1062411 


1062962 


10563.1 




CONTIG 13 


SEQ ID N°7073 


1463 


2446 


3775.3 




CONTIG13 


SEQ ID N°7073 


1463 


2446 


1067.1 




CONTIG7 


SEQ ID N°7067 


163133 


163567 


1980.3 




CONTIG7 


SEQ ID N°7067 


163628 


163918 


1102.1 




CONTIG7 


SEQ ID N°7067 


189792 


190835 


1109.1 




CONTIG7 


SEQ ID N°7067 


195874 


198036 


4935.1 




CONTIG9 


SEQ ID N°7069 


1604318 


1605460 


7686.1 




CONTIG9 


SEQ ID N°7069 


1604318 


1605460 


1771.2 




CONTIG8 


SEQ ID N°7068 


552424 


553377 


1773.1 




CONTIG9 


SEQ ID N°7069 


961264 


962745 


1296.1 




CONTIG9 


SEQ ID N°7069 


961264 


962745 


1297.1 




CONTIG9 


SEQ ID N°7069 


959864 


960817 


1298.1 




CONTIG9 


SEQ ID N°7069 


959562 


959810 


1302.1 




CONTIG9 


SEQ ED N°7069 


958145 


958699 


1303.1 




CONTIG9 


SEQ ID N°7069 


957452 


957922 


1307.1 




CONTIG9 


SEQ ID N°7069 


956035 


956523 


1309.1 




CONTIG9 


SEQ ID N°7069 


955209 


955589 


1310.1 




CONTIG9 


SEQ ID N°7069 


954726 


955034 


1312.1 




CONTIG9 


SEQ ID N°7069 


953857 


954711 


1313.1 




CONTIG9 


SEQ ID N°7069 


953085 


953864 


1315.1 




CONTIG9 


SEQ ID N°7069 


952598 


953161 
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1110 1 




SFn in 7\T o 7060 


CK0Q9^» 

73U7ZU 


7 J 1 UJO 


1 190 1 


PONTTGQ 

wv_/i> l lKjy 


vpf) TD "M°7060 

OUA^ XVJ IN /UU7 


048779 
74o 1 1 A 


0S0907 


1 191 1 


PONTTP0 


qpo rn >j o 706o 

Or>V^ LLJ IN /UD7 


yHo 1 ou 


04874*3 


1 199 1 


PmsJTTPO 


OJ_/\^ ILv IN / \j\jy 


Q4779A 
74 / /ZO 


0481 87 
y-xo 10/ 


1 191 1 


PONTTP0 

W WIN 1 1U7 


oJj,\^ IL-/ IN /UO7 


0471 ^4 


047706 

7*T / / W\7 


1 194 1 


v^vyiN i i\jy 


oC/y 1J_^ IN /\J\jy 


yn-ou 1 1 


04668 S 


1 19S 1 


PfYNTTOO 


oJ-/V< A-L' IN /U07 


74j 1 OZ 


04^869 

74JOUZ 


1 197 1 


PONTTO0 

W WIN 1 IVJ ~ 


OJ_>V^ I-L' IN / KJKjy 


044 1 9 S 


04S000 


1 198 1 


POTsJTTPO 


uCy JUL^ IN /U07 


74j oUU 


044988 

74tZ O O 


1 110 1 


PONTTO0 


qpo rn isj°70^o 

OL/V^ 1JL7 IN / \J\jy 




04^S^S 

7TJJJ J 


1111 1 


PONTTP0 


cpo rn >j°70^»o 


049Q^R 
y-r£,y D 0 


04394^ 


1 119 1 


POTsTTTPO 


cpn TD M o 70^0 


049^71 

7t"^-J / 1 


0490^4 

7T - Z7J 4 T 


1 111 1 


POTsJTTPO 


cpn TF> lsJ o 70fiQ 


041 

"4 1 OJJ 


049^68 

74ZJUO 


1 114 1 


PONTTP0 

vUlN 1 1VJ7 


CPO TD M o 70^0 


04fRfi7 


041 6S^ 


1 117 1 




ui_-y i-L/ in / \j\jy 


04norn 

y-r\j\J\JD 


040949 


1 110 1 


PfYWTTPO 

W WIN I IvJ J7 


cpn rn isi°7n^o 

O-L/V^/ LLJ IN /U07 


0^74^0 
7J /HDy 


0400^6 


1 140 1 




cpn rn 7si°7n^o 


0^71 1 1 
y>> /111 


0^7446 
7J / 44D 


1 141 1 




^IFO TF> >J°70^0 

oJC-V^ IN /UO7 


7JUJUO 


0^71 1 4 

7J / 1 14 


1 149 1 


wwiN 1 IKjy 


cpn m xr o 70AQ 


7JJJ 14 


0^6^00 


1 141 1 


V^/UiN 1 1U7 


^FO TFi "M o 70^0 
oiii/v^/ llj in /uoy 


7j4ojU 


7JJJ 1 -7 


1 144 1 




cpn rn 7si°70^o 

oE/y 1J_^ IN / \j\jy 


O^^OI 0 
yj D\j 1 y 


0^4800 

7J407U 


1 14S 1 


wwiN 1 1U7 


O-C/V^ LLJ IN /UO7 


0^9944 

7JZZ44 


7jjUZO 


1 146 1 




cpn TFi >J 0 7fi6Q 


0^1 R^9 

7J 1 OjZ 


0^99^4 

7JZZJ4 


1 147 1 


PfYWTTPO 


cpn m 7\i°7n^o 

OJQv< 11-' in / \j\jy 


0^0,4^0 

7Ju4JU 


0^ 1 8^S 

7 J I D 


9198 1 


POTxJTTPI 


^FO TD T\T°7fi6^ 

OJjy 1J_7 IN / \J\JD 


740 


1 606 


1 148 1 


PPfMTTOO 


^Fn m xr°7n^o 

k3Jj/V^ LLJ IN / \JKjy 


y z* 1 1 z**-t 


0^0444 


1 ISO 1 

I JJU. 1 


W WIN 1 l\jy 


cpn rn Tsr°70fiQ 


y a. 1 z. jvj 


097790 

7Z / / A\J 


1 1S1 1 

I -7^7,7 . 1 


POTsTTTPO 


ODy LU IN j \j\jy 


09S900 
y z*o £*yy 


0971 88 

7Z / 1 O O 


1 1SS 1 


POMTTPO 

V/U1N 1 1VJ7 


cpn TF> >J°706Q 

o£jV^ IJLy IN /UO7 


01 0^^ 

y 1 7JJJ 


09^978 

7ZJZ / O 


1 1S6 1 


PfYNFTTfrQ 
wv_/in x i vj y 


cpn rn TnT o 7060 


01 801 0 


01 01 %1 

y 1 y 1 j / 


1 1S8 1 

1 ,7,70. 1 


PPTNTTTOO 

1 1\J7 


OJJ/V^/ ii_y in / W7 


01 <^SR1 

!7 1 UJO 1 


01 741 7 

7l / *T 1 / 


1 ISO 1 


PPfMTTOO 


cpn TD TST°70^0 

ODU 11-/ IN / UU7 


01 6070 
y 1 uu / " 


01 6S70 
y 1 \jo 1 \j 


1 160 1 




cpn TD KT°706Q 

OLZ/V^ ll_7 IN /V\J37 


01 4460 


01 6040 
y 1 out7 


1 161 1 


POTnTTTPO 


cpn TD T\T°7<ViQ 

iDLLKJ^ LLJ IN / \J\jy 


01 ^807 
y 1 JO7 / 


01 4971 

y 1 4Zr / 1 


1 16S 1 


WWiN 1 1U7 


cpn TD >J°706Q 
oDy 11-/ IN /\j\jy 


01 1 S90 

"i 1 oz*y 


01 1040 

y 1 31/47 


1 166 1 


POIMTTOO 
W WIN i lKjy 


cpn TD M°70^0 
oCy IIv IN / WO7 


01 0007 
y 1 uuu / 


01 14S8 

7l 14 JO 


1 167 1 


PfYNFTTOO 


cpn TD 7\r°70^Q 

OX_/V^ 11_7 IN / W7 


00040S 


000701 
y\jy / w 1 


1 168 1 


POTsTTTPO 

V^V71N 1 1 VJ ^ 


cpn TD T\T o 70fiQ 

O-Cy LLJ IN / \J\jy 


008^97 


000^00 

7U7JU7 


1 160 1 

1 JO?. 1 




cpn TD 7\T°7nfiQ 


008004 

7VOUU4 


008S04 


1 170 1 


PfYNTTPrO 


cpn TD >J°70^0 


0071 SI 
y\j / 1 ji 


007001 

y\j 1 77J 


1 171 1 


POTnJTTPO 

W WIN 1 1U7 


cpn TD 7ST 0 70fiQ 


00S964 


007060 
yyj / \j\jy 


1373.1 


CONTIG9 


SEO ID N°7069 


903498 


905264 


1376.1 


CONTIG9 


SEQ ID N°7069 


902217 


902987 


1378.1 


CONTIG9 


SEQ LD N°7069 


900968 


901987 


1380.1 


CONTIG9 


SEQ ID N°7069 


899639 


900985 


1382.1 


CONTIG9 


SEQ ID N°7069 


898691 


899314 


1383.1 


CONTIG9 


SEQ ID N°7069 


898266 


898532 


1384.1 


CONTIG9 


SEQ ID N°7069 


897979 


898281 


1386.1 


CONTIG9 


SEQ LD N°7069 


897062 


898066 


1435.1 


CONTIG9 


SEQ ID N°7069 


864915 


866060 
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1492.1 


CONTIG9 


SEO ID N°7069 

k-/ J — i ■KJ XX/ X 'I / \J\J y 




821579 


1494.1 


CONTIG9 


SEO ID N°7069 

k/X-/V^ ixy XT / \J\J J? 


820792 


821109 


1500.1 


CONTIG9 


SEO ED N°7069 

k/X_ /\V UL/ XT f \J\jy 


8 1 4942 


815592 


1501.1 


CONTIG9 


SEO ID N°7069 

k/ X_/ W XL/ i. T / \J\J Zs 


814765 

O 1 "T / v/ ^/ 


815241 


1503.1 


CONTIG9 


SEO ID N°7069 

kJ J—* XJ_/ 1 t / V/v y 




813485 

o x ,y ~u~/ 


1518.1 


CONTIG9 


SEO ID N°7069 

k-» x_/ 1 1 / Xt / \J\J y 


800919 


803852 


1519.1 


CONTIG9 


SEO ID N°7069 


799375 


800655 


1520.1 


CONTIG9 

V^X -1 ^ X X X^* 


SEO ID N°7069 

k-J X_y W 1 I / 1 T / \J\J y 


796924 


799023 


1538.1 


CONTIG9 


SEO ID N°7069 

k-/X-/y^ XX-/ XT / y 


775795 


776787 


1627.1 


CONTIG8 

Xw-* V_-^ -L ^ X X X_J *J 


SEO ID N°7068 

ixy XT / v/v/\j 


454194 


455027 


1631.1 


CONTIG8 

Xw/ X— ^ 1 X 1 X V^J \J 


SEO ID N°7068 

k-/ S—J IX/ XT / \J \J \J 


456917 


457408 


1635.1 


CONTIG8 

X*^ X^ _L T X X X^J K-f 


SEO ID N°7068 


459876 


460751 


1663.1 


CONTIG8 

V_X X«^ J. ™ X X x^j v^i 


SEO ID N°7068 

k-> X — / XX-/ XT / V/V/t/ 


477274 


477810 


1674.1 


CONTIG8 


SEO ID N°7068 

k/Xvw XX-/ XT / v/Vj'^J 


488306 

TOU»/V/V/ 


49021 3 


1676.1 


CONTIG8 


SEO ID N°7068 

* X—S V_/ JLJL/ -1 X / vVU 


491037 


491288 


1723.1 


CONTIG8 


SEO ID N°7068 

k/X_/V^ XX-/ XT / w W U 


522733 


523200 

fc/^«/^V/V/ 


1724.1 


CONTIG8 


SEO ID N°7068 

UXjW 11/ XT / \/VV/ 


523306 

^) *J *J \J\J 


523560 


1725.1 


CONTIG8 


SEO ID N°7068 


523670 


523945 


1749.1 


CONTIG8 

V — > V-/ X T X 1VJU 


SEO TD N°7068 


537066 


537341 


1760.1 


CONTIG8 


SEO ID N°7068 

<S ±~f\^r ' ' / IT / V/V/V_> 


544546 


545574 


1767.1 


CONTIG8 


SEO ID N°7068 

kJX-/VX XX-/ XT / V/v/O 


549553 


550482 


1772.1 


CONTIG6 

v-/ x t x x v_i w 


SEO ID N°7066 

Ul^V^ xx-/ It / \J\J\J 


HQ 


1 804 

1 Out 


1785.1 


CONTIG9 


SEO ID N°7069 

kJLJV/ XX-/ X T f 


070464 

J/ / V/~V/^ 


970808 


1787.1 


CONTIG9 


SEO ID N°7069 

k/X-/V^ XX-/ XT / V/V/ -/ 


970976 


97141 6 


1892.1 


CONTIG7 

v— /x ™ x xv_j / 


SEO ID N°7067 

LJX- 'V^ XX-/ XT / \J\J 1 


7631 5 


77967 


3771.3 


CONTIG13 


SEO ID N°7073 

k»/ X — i 1 1/ XT / \J 1 


43581 


46268 


1952.1 


CONTIG9 


SEO ID N°7069 

k/XjW 11/ XT / V/V/-/ 


1328145 


1 328393 


1981.1 


CONTIG9 


SEO ID N°7069 


1347920 


1348357 

X w/ r (J J J / 


2005.1 


CONTIG9 

Xw> X X X X X-J 


SEO ID N°7069 

^xjw ii/ xi / v/v/-/ 


1364695 

X «y v ivy «y 


1366614 

X JUUU x i 


2006.1 


CONTIG9 


SEO ID N°7069 

k^F lw 11-/ X ^ / t_/ \J 


1366816 

x «-/ \_/ <J X \J 


1368231 


2026.1 


CONTIG9 


SEO ID N°7069 

UXj\^ 11/ XT / \J\J 


1380350 

A ~J \J \J «/ \J 


1380739 


2059.1 


CONTIG9 


SEO ID N°7069 

k-F X^> 11/ X x / \-/ V/ ^/ 


1407671 


1408075 

X iv/Ov/ / •/ 


2066.1 


CONTIG9 


SEO ID N°7069 

tJX^Vy XX-/ XT / V/v/ «/ 


1418764 


1421481 


2083.1 


CONTIG9 

x^x vyx x x a x^j ^ 


SEO ID N°7069 

UX^V^ 11/ XT / \J\J ~S 


1436828 


1439185 


2084.1 


CONTIG9 

V_>^ 1. X X X \J — ^ 


SEO ID N°7069 

J — < 11/ XT / \J\J S 


1439377 


1 440240 


2086.1 


CONTIG9 


SEO ID N°7069 


1440346 

X ~ iUJ~U 


1441680 


2125.2 


CONTIG9 


SEO ID N°7069 

Ui-fW XX-/ XT / \J\J ^* 


1470612 


147147? 

X I / 1 I / X- 


2132.1 


CONTIG11 

X^X^^J. X X X X — M x x 


SEO ED N°7071 

^ — ^ * — ' vy xx-/ xt / \j 1 x 


144883 


145770 


2133.1 


CONTIG11 

x^x/l X X X XtaJ X X 


SEO ID N°7071 

k-* X j XX/ XT / v/ / X 


144142 


144882 

X 1 1 uuz. 


2134.1 


CONTIG1 1 

w V-/ X X X X A X 


SEO ID N°7071 

Ul^V/ J J-/ XT / \/ / X 


143221 


143718 


2135.1 


CONTIG1 1 

V_-> J. ^ X X V — \ X X 


SEO ID N°7071 

ij-Lrw ii/ xt / vy / X 


143204 


144145 


2141.1 


CONTIG1 1 


SEQ ID N°7071 


137310 


137879 


2202.1 


CONTIG1 1 


SEQ ID N°7071 


87995 


89587 


2311.1 


CONTIG8 


SEQ ID N°7068 


553457 


554560 


2312.1 


CONTIG8 


SEQ ID N°7068 


554678 


555862 


2314.1 


CONTIG8 


SEQ ID N°7068 


555930 


556454 


2315.1 


CONTIG8 


SEQxDN°7068 


556495 


556839 


2316.1 


CONTIG8 


SEQ ID N°7068 


556651 


558360 


2317.1 


CONTIG8 


SEQ ID N°7068 


558425 


558796 


2319.1 


CONTIG8 


SEQ ID N°7068 


559260 


560681 
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2321.1 


CONTIG8 


cpn TO N°7068 

O AL VV AAV IN / UUO 


jOUoj 1 


JUZ Jt / 


2323.1 


CONTIG8 

V—^Wl > A 1VJO 


cpD TO N°706R 

O A_V W^ AAV IN / UUO 




5699R1 

JUZI70 I 


2324.1 


GONTIG8 

v^v/i > A AVJ O 


9FO TO N°706R 

i3A_,VV^ A A-/ IN / UUO 


jOj 1 1 0 


56^R1 ^ 

JUJ O 1 J 


2325.1 


CONTIG8 


<sFO TTj N°706R 

O A_/VJJ IAV IN /UUO 


SA^RQQ 


564030 

JU^7JU 


2326.1 


GONTTG8 

V^l > A AVJ O 


O A-'VV^ AAV IN /UUO 


jo^yz / 


5661 1% 
juu 1 z j 


2330 1 


GONTTG8 

V-'VJIN A 1VJO 


CFO TO N°706R 

OJ-yy AAV IN /UUO 


jO /jOj 


567054 

JU / Z7 J*T 


2333 1 

-J J . A 


GONTTG8 

V^V/I> A AVJ O 


Qpo TO N°706R 

OJ-/V^ AAV IN /UUO 


S^R1 90 
JOo 1 ZAj 


560797 

JO" /z / 


2337 1 


CONTTG8 

V<V/1 1 A AVJO 


9FO TO N°7068 

OJuy AA_-/ IN / UUO 


^79001 


57^461 

J / JtU 1 


2343 1 


GONTTGR 

V^ VJI > A AVJ O 


CFO TO N°70fi£ 

o AZ/V^ li-y IN /UUO 


J / J7JJ 


^7R64R 

J / OUH-O 


2345 1 


GONTTG8 

V'Vyli A AVJO 


O AJ/V^ AAV IN / UUO 


^7R^i^S 
J / OOjj 


^7QR01 

J / Z/OZ7 A 


2346 1 


CONTTG8 

V^ V^/J_> A AVJO 


CFO TFj M°70fiR 

OXiy AA-V IN /UUO 


S70RRR 
j / 7000 


^R1 094 

JO I vZ*t 


2348 1 


GONTTG8 

V^ v_y 1 > A AVJO 


SFO TO N°7068 

Oi-/y A A-/ IN / UUO 


5R1 06^ 

JO 1 UU J 


5R41 01 

JOT- 17l 


2350 1 


CONTTG8 

V_^Vyi> A AVJO 


9FO TO N°706R 

OA_/VJj AAV IN / UUO 




5R6^71 

. joUj / 1 


2357 1 

J J / . A 


GONTTGR 

V-/VJ1> A AVJO 


CFO TTj 1\J°70^R 

ODy AAV IN / UUO 


S001 70 

J7U 1 / U 


501 0^^ 

J7 1 Uj j 


2373 1 


GONTTG8 

v^v_y±> A AVJO 


OLiy AAV IN /UUO 


^01 RR^ 

UU 1 OOJ 


609900 


2385 1 


GONTTGR 


9FO TO N°70^R 

OIJ»v^ JLLV IN /UUO 


Ul J /Uj 


61 ^90 

U 1 JJZ7 


2465 1 


GONTTGl 9 

V^/ VJj.\ A AVJ A Z 


cpn TTj 7\T°7079 

OJ-*v^ AAV IN /U/Z 


7O7J / 


Q7Q1 ^ 


2466 1 


GONTTG1 2 

V— 'VjlN A IVJ 1 Z 


CFO TO N°7079 

OLy AAV IN / U / Z, 


07RRR 
y / 000 


0R47R 
yo^f / 0 


2467 1 


GONTTGl 9 

Vx WIN A AVJ I Z 


^FO TO N°7079 

oJ-zy AAV IN /U/Z, 


QR400 

70 £ T77 


QQRO^ 


2468 1 

Z»^UO . 1 


GONTTG1 9 

V_xWIN 1 AVJ 1 Z 


^FO TO N°7079 

OAJ/V^ AAV IN /U/Z, 


QQR90 


I UUO jo 


2469 1 


GONTTGl 2 

WWIN 1 AVJ 1 Z 


QFO Tr> isj°7079 

ODy AAV IN /U/Z 


1 00647 

1 UUUt- / 


1 01 ^7^ 
1 U 1 j / J 


Zr*T / U. A 


GONTTG1 9 

V^WlN AlVJlZ. 


cpO TT, >J 0 7n79 
oJCv^ ilV In /U/z 


1 01 670 
iviO/7 


1 UzoOO 


2471 1 


GONTTGl 9 

WIN 1 IVJ 1Z, 


^IFO TO N°7079 

OAJ/V^ liV IN /U/Z 


1 0^1 OR 
1 Uj 1 Uo 


1 04^70 
1 UtO /y 


2472 1 


GONTTGl 9 

A AVJ 1 Z 


9FO TO N°7079 

OJ-/y AAV IN /U/Z 


1 04^ R0 
1 UtJ ou 


1 0555R 

1 U J J JO 


2473.1 


GONTTG1 2 

V*' V_y 1 N A AVJ A Z» 


SFO TO N°7079 

OA_fVV AAV IN /U/Z 


1 055RR 

A U J JOO 


1 06^70 
1 UOJ /u 


2498 1 


GONTTG1 7 

V^ V/l > A AVJ A Z 


qfo TTj T\T°7079 

ODy AAV IN /U/Z 


1 941 ^5 

I Z*T 1 J J 


1 94S45 


2500 1 

*J \J\J . A 


GONTTGl 2 

V-/V/1N A AVJ A Z* 


SFO TO N°7079 

OAJ/V^ AAV IN /U/Z 


1 9^994 

1 Z J 7Zt 


1 961 1 0 
1 ZU I I u 


2684 2 


GONTTGR 

wwin a ivjo 


SFO TO N°70fiR 

ODy AAV IN / UUO 


7401 7^ 

/t-U 1 / J 


7404^0 

/ H-UH-JU 


2710 1 


CONTTG9 

V-' W I > A AVJ Z7 


SFO TO N°7069 

Oby AAV IN / UU7 


1 77R54R 

1 / / O JtO 


1 7R09R1 
1 / ouzo 1 


2743.1 


GONTTG9 

V_> V-/1 > A AVJ Z* 


SFO TO N°7069 

OJUy AAV IN / UUZ7 


1 R1 01 90 
1 0 1 u 1 zu 


1 R1 0^5^ 

IOI UJJJ 


2769 1 


GONTTG9 

. Vvli 1 AVJ 37 


QFO TO N°7069 

O AJ/ VV^ AAV IN / UU" 


1 R96659 

I OZrUU JZ. 


1 R9R569 

1 OZOJOZ 


2782 3 


GONTTGR 

W WIN A AVJO 


CFO TO N°70^i8 

ODy AAV IN /UUO 


7R46Q 


70R75 

/70 / J 


5631 2 

J \J J A .Z- 


GONTTGR 

v_y 1 > A AVJO 


SFO TO N°7068 

OJuy AAV IN / UUO 


7R469 


70R75 
/ 70 / j 


2784 1 


GONTTGR 

V^- V_/l > A AVJO 


QFO TO N°706R 

OAJ/V^/ AAV IN / UUO 


7^057 

/ JUJ / 


7^^0R 
/ JJ7O 


2889 1 


GONTTGl 3 

V^Wl > A AVJ A J 


CFO TO N°707^ 

OU/y AAV IN / U / J 


1 0400 
1 7HU7 


90090 
ZAjxjZAj 


2890 1 


GONTTG1 3 

V/W1> A AVJ A J 


CFO TO N°707^ 

OLy AAV IN / U / J 


1 R^05 
1 0 jU j 


1 01 69 
I71 oz 


2892 1 


GONTTGl 3 

V-/1 > A AVJ A J 


SFO TO N°7073 

Oliy AAV IN / U / J 


1 765^ 

1 / U J J 


1 7007 

1 / Z7\J 1 


2894.1 


CONTTGl 3 

V/VJl > A AVJ A J 


9FO TO N°707^ 

ODy AAV IN /U/J 


1 JU / u 


1 7659 

1 / OjZ 


2895 1 

J. A 


GONTTGl 3 

V_/VJ1> A AVJ A J 


9FO TO N°707^ 

OI_/V</ AAV IN /U/J 


1 51 05 

A J 1 7J 


1 56R^ 
IjUoj 


2896 1 


GONTTGl ^ 

W Wl> A AVJ 1 J 


CFO TO N°707^ 

OJCy AAV IN /U/J 


1 41 4^ 

l*tl *T J 


1 40S9 


3353 1 

J J J J . A 


GONTTGl 1 

V/U1> A IVJ 1 J 


^FO TO N°707^ 

OCy AAV IN /U/J 


1414^ 


1 4Q^9 

1 t7JZ 


2915.1 


CONTIG13 


SEQ ID N°7073 


21609 


22910 


3909.2 


CONTIG13 


SEQ ID N°7073 


21609 


22910 


2916.1 


CONTIG13 


SEQ ID N°7073 


23001 


24515 


3908.2 


CONTIG13 


SEQ ID N°7073 


23001 


24515 


2917.1 


CONTIG13 


SEQ ID N°7073 


24645 


24908 


2918.1 


CONTIG13 


SEQ ID N°7073 


24971 


26290 


2919.1 


CONTIG13 


SEQ ID N°7073 


26453 


26965 


2920.1 


CONTIG13 


SEQ ID N°7073 


27050 


27442 


2921.1 


CONTIG13 


SEQ ID N°7073 


27535 


28215 
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101 R 1 


GONT1G1 3 




41 ^6^ 

*+ 1 DVJD 


42778 


10565 1 


CONTTG9 

W 1 > 1 1VJ J7 


<sFO TD N°706Q 

OJJ/Vj/ XXJ 1 > 1 VJVJjs 


1 190QS6 

1 DZ.VJZ7 DVJ 


1122047 

1 V7" / 


1947 2 


CONTTG9 

V/wll 1 1VJ -7 


9FO ID N°706Q 

OJuy XXJ IN /UU7 




1 122047 

1 D ^^VJ^T / 


101 Q 1 


OONTTGQ 

V^-V7lN 1 1VJ27 


CFO TD N°706Q 

ul_/y XXJ IN / UU7 


1 3ZU27 DO 


1 199047 

I DZ*Z*VJ'-T I 


1020 1 


CONTTGQ 

V^WIN 1 1VJ27 


9FO TF) N°706Q 

OU-y 1_L/ IN IVJVJZ7 


1 190447 


1 190Q05 

1 D ZsVJzsVJD 


1021 1 


CONTTGQ 

\^VJ1>1 1 1VJ~ 


cpn TD N°706Q 

V</ 1J_7 IN IVJVJZJ 


1 1 1 Q16^ 


1 1901 5Q 

1 D A,\J X DZ7 


1021 1 


CONTTG9 

V^V/ll 1 IVJ J7 


SFO TF) N°706Q 


1 11 7590 

1 J 1 / DZAJ 


1 11 Q215 


1024 1 


GONTTGQ 

V^WIN 1 1VJ.7 


cpn TF) N°706Q 
ocy in ivjvjy 


1 1 1 ^^»S1 
X D X OOjj 


1 11 7507 

X D X 1 DVJ 1 


1095 1 


GONTTGQ 

V^-wlN 1 1 VJ27 


crn TF) N°706Q 

IJ-J 1 IN /U07 


1^1 ^AA^ 


1 1 1 6551 
XD x vjDD x 


1027 1 


GONTTGQ 


cpo TD N°706Q 


1 11 4Q06 
l d x Hyvjvj 


1 1 1 51 Q0 

iJiJl 7U 


1046 1 


GONTTGQ 

V^VJIN 1 1VJ;7 


o Y2j\^ LLJ IN /UU" 


1 9GRQ9A 


1 9QQ9R9 
1Z77Z0Z 


1047 1 


GONTTGQ 
v^win i ivj y 


9FO TF) N°706Q 


1 9QR61 S 
1 Z27 00 1 D 


1 9QQ71 9 

1 Z-727 / 1 Z 


10R7 1 

JWO / . 1 


GONTTGQ 

V^V7l> 1 1VJ-7 


QT?n TF) N°706Q 


1 01 6609 

XVJ X VJVJVJZ. 


1 01 7490 


10Q7 1 

DVJZs 1 . X 


OONTTG1 1 

V^VyiN 1 IVJ 1 1 


9FO TF) N°7071 

OJj/VJ_ 1U IN / VJ / X 


1 R1699 


1 R41 R9 

1 OH- 1 OZ 


1100 1 

J 1 \7\J . 1 


GONTTG1 1 


<sFO TH N°7071 

OXZ,\/ XXJ IN I \J I X 


1 89447 


1 R9740 

X OZ / HVJ 


1101 1 

J1U1.1 


GONTTG1 1 

V^VJUN X 1 VJ 1 1 


9FO TD N°7071 

OJ-/\y 1 J_-/ IN / VJ l X 


1 OZ-UOU 


1 R940Q 


1102 1 


OONTTG1 1 

V^VJIN 1 IVJ 1 1 


cpn TH N°7071 

O X_/ V^ 11-7 IN f VJ 1 X 


1 R05Q5 

X OVJDzJD 


1 R1 Q65 

lOl 7UJ 


1101 1 


OONTTG1 1 

V-zV-JlN X IVJ 1 1 


<sFO TF) N°7071 

OXJ/V^ IJV IN 1 VJ 1 X 


1 7Q1 91 


1 R0461 


1104 1 

.7 1 V7~T. 1 


GONTTG1 1 

V-'VJUN 1 IVJ 1 1 


Qpn TF) N°7071 

OUy 1J_7 IN 1 VJ 1 X 


1 77715 


1 7Q0Q1 

X 1 Z7VJZ7D 


1105 1 


OONTTG1 1 

V^VJl > 1 1VJ1 1 


cpn TF) N°7071 

ODy 1 L/ IN IVJ 1 X 


1 7646Q 

1 / VJ'-rVJZ7 


1 7760S 

XII VjVJD 


1117 1 


GONTTGQ 

V^VJIN 1 IVJ z/ 


<sFO TF) N°706Q 

OUy 11_7 IN / \JVJ27 


1 R617Q5 

1 OVJD 1 7J 


1 R64146 

X OUt D'-rVJ 


26R5 2 


GONTTG1 1 

V^VJUN 1 IVJ 1 D 


cpn TF) N°7071 

ocy JJ_-/ IN IVJ 1 D 


1 9699 


1 16R1 

X DVJOD 


1156 1 


GONTTG1 1 

V^ VJIN 1 1 VJ I D 


9FO TD N°7071 

OXZ\J XXJ IN IVJ ID 


1 9699 


1 DvjoD 


1157 1 

D D D I . 1 


GONTTG1 1 

V^VJIN X iVJl D 


<sFO TD N°7071 

OCy 1JL/ IN IVJ ID 


1 1 491 

1 1 T-Z I 


1 9£1R 


1162 1 


GONTTG1 1 

V^V/IN 1 IVJ 1 D 


cpo TD N°7071 

OCy XXJ IN 1 VJ 1 D 


7015 


7R15 


1164 1 

•7.7 \J*T. 1 


OONTTG1 1 

V^VJIN 1 IVJ 1 J 


CPA TD N°7071 

OlJ/VJ^ 1 JL/ IN IVJ ID 


S9QR 

JZ70 


64Q7 


1165 1 


GONTTOI 1 

v^v_7l > 1 IVJ 1 J 


CFO TD N°7071 

OJ_>V^ 11-7 IN 1 VJ 1 D 


479 R 

■ /Zo 


^1 41 


14 1 


GONTTGQ 

V.^ V7l> 1 1VJ-7 


QFO TD N°706Q 

ODy 1J-/ IN 1 VJVJZ7 


40111 


41707 


15 1 

D D . 1 


GONTTGQ 

vyVylN X IVJ _7 


^sFO TF) N°706Q 

ODy 1J_J IN 1 VJVjy 


44061 


44171 

H-HD 1 1 


151R 1 

•J .7 J O • 1 


GONTTGQ 

V-'VJ'IN 1 I VJ 27 


CFO TF) N°706Q 

OH/y XXJ IN / UU-7 


1 RR1Q47 


1 RR51 17 

X OOD X D 1 


1510 1 


GONTTGQ 


CFO TF) N°706Q 

OXZ-V^ 1LJ IN 1 VJVJzf 


1 RR594Q 


1 00 DDvJVJ 


1541 1 


GONTTGQ 


SFO TF) N°706Q 

LJ I—* V7 XXJ IN / UU7 


1 RR7171 

lOO ID ID 


1 RRRR1 5 
100001 j 


1542 1 

«7 ~J^jC . 1 


GONTTGQ 

V-x 1 1 VJ 2? 


9FO TF) N°706Q 

ul_/y llV IN 1 VJ\JZ7 


1 RRRQQ7 


1 RRQ11 1 

1 OO7J 1 1 


1544 1 

*J .7 1^ . 1 


GONTTGQ 

V>V71N 1 IVJ 27 


CFO TF) N°706Q 

ODy 11-7 IN 1 VJVJzJ 


1 RRQ661 

1 00-7VIU.7 


1 RRQQ69 

1 00!727UZ 


1545 1 

.7 J. 1 


GONTTGQ 

V^ VJ1N 1 IVJ 27 


<sFO TF) N°706Q 

ODy I IV IN IVJVjzJ 


1 RQ0061 

1 OZ7VJVJVJD 


1 RQ09R4 


154R 1 

D JtO. 1 


GONTTGQ 

V>V71 N 1 1 VJ27 


<sFO TF) N°706Q 

ljJ_/V^ llV IN / \JU27 


1 RQ1 1 09 
X 07 1 1 uz 


1 RQ91 4R 
1 oyz xho 


1550 2 

j ddvj.z. 


GONTTGQ 

V-/V-/IN 1 1 VJ 27 


9FO TF) N°706Q 

OL/y 11.7 IN / VJVjjs 


1 RQ1Q44 


1 RQ51 99 

1 07J 1 zz 


1551 1 
j ^ j i . i 


GONTTGQ 

V-'V-ZIN 1 IVJ 27 


9FO TO N°706Q 

OJuy 11-7 IN /UU" 


1 RQ5156 


1 RQ5775 

1 0-7.7 1 1 D 


1552 2 


GONTTGQ 

V-^ V71N 1 1VJ-7 


<iFO TF) N°706Q 

ul3y XXJ IN I VJVJZ7 


1 RQ576R 

1 Oz7D I VJO 


1 RQ60Q4 

1 OzfVjVJzJ^-T 


16R1 1 


GONTTGR 

V>VJ1N 1 1VJO 


CFO TD N°7068 

OCy 1J_7 IN IVJVJO 


1 9QS40 

1 Z-7 DHVJ 


1 11 496 

XD X tZO 


17 1 
d i • x 


GONTTGQ 

^✓VJIN 1 IVJ -7 


cpn TD N°706Q 


44S61 


44R97 


3740.2 


CONTIG9 


SEO ID N°7069 


1025799 


1027007 

X \J X—i I \J\J 1 


2327.1 


CONTIG12 


SEQ ID N°7072 


10 


987 


3742.1 


CONTIG12 


SEQ ID N°7072 


10 


987 


1944.1 


CONTIG13 


SEQ ID N°7073 


999 


1205 


3793.1 


CONTIG9 


SEQ ID N°7069 


1491228 


1491776 


3794.1 


CONTIG9 


SEQ ID N°7069 


1490472 


1491212 


3795.4 


CONTIG9 


SEQ ID N°7069 


1489860 


1490459 


3797.2 


CONTIG9 


SEQ ID N°7069 


1921869 


1923518 


3798.1 


CONTIG9 


SEQ ID N°7069 


1923682 


1924194 
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1799 1 


CONTTG9 

> J. 1 VJ J7 


cpn Tn •\r o 7060 

OJZ/VJJ l-L/ IN /U027 


1 Q9491 0 


1 094S94 

1 27Z.*T,JZ.*T 




CONTTG0 

V-'V/l^ 1 JLVJ27 


QPf) fn M o 7060 


1 Q94^^H 


1 096176 

1 27ZU.J / O 




PONTTHO 

1 1VJ27 


CPA Tn T\T°7060 


1 Q9£^1 SI 


1 077481 

1 27Z / HOJ 


1897 1 


POTsTTTHI 1 

V> V71N 1 1 VJ 1 J 


cpo rn xr°707i 

oE/y JUL/ IN /KJ/D 


zozz 


1119 


1890 9 


V^V/IN JL 1VJ 1 


<sFO Tn TvJ°7071 

OJCv^ ULJ IN IKJfD 


1107 


4SS7 


1847 1 


CONTTO0 


cpn Tn T\J°7060 


IH-oZO 1 o 


1 481911 


1848 1 


CONTTG0 

V' V71 > X AVJ27 


CPO TTj TsJ°7060 

OJDy 1J_J IN / UU27 


1 481 860 

i Ho 1 OOU 


1 489116 


1840 1 

J Ot27. J 


PONTTG0 

V71>l 1 1VJ27 


QFO TH "M°7060 

OJC/V^ l-L-' IN / WU27 


1 480971 


1 481 680 
1 to 1 OOO 


18S0 1 

J 0«JU. 1 


POTnJTTGO 

V^V/IN 1 1 VJ27 


cpn TH TsJ°7060 

OEy 1J_J IN f\J\jy 


1 470788 

It- / 27 / OO 


1 48091 6 
1 touz 1 0 


1800 1 

J 027U. 1 


GOTSTTTG1 1 

V^vJi > 1 1VJ 1 J 


vFf) TD "M°7071 

1LJ IN /\J/D 


70841 


91 641 
Z 1 ot 1 


10 1 

j 27. i 


GONTTGO 


9FO TT> T\J°7060 

OJ-'V^ 1J-J IN / V/U27 


44006 


4S6SS 

tJUJJ 


8044 1 

OUtt. 1 


GOTsJTTG7 

V^V/IN JL 1 VJ / 


QUO Tr> TsJ°7067 

kJXZ/V^ 1JJ IN / WU / 


1 661 8 
1 DO 1 O 


1 78^0 

1 / OD\J 


1086 9 


v^- V/l > ± i\jJ 


QFO TD M°706S 

OJ-yV^/ 1J_J IN / VJUw? 


z 


70S0 

Z27J27 


41 94 9 


GOTsITTGS 


cfo TF> ]sj°706S 


z 


90S0 

Z27 J27 


1007 1 

.J 2727 1 . 1 


GOT\FTTO0 

V-- V7J > 1 1VJ27 


^sFO TD TsJ°7060 

Ol-rVJJ IU IN / V7U27 


1 090089 


1 091 801 

1 yz 1 027J 


40 1 


go>jttoo 

V^wlM 1 1VJ27 


cpn TD M°706Q 


4^816 


tOOJO 


401 8 1 

tu 1 0 . 1 


GOXTTTO0 

V^Ul\ 1 lvJ27 


cpn TTj TsJ°7060 

OX2/VJJ 1-LV IN / 


1 27 1 JH-OH- 


1 01 4610 

1 27 1 tOJ>U 


401 0 1 


v^UlN 1 1U7 


cpf\ m T\J°706Q 

ljJCv^ JLU 1 IN /U027 


1 01 9844 

1 27 IZO't't 


1 27 1 Jt / O 


4099 9 


GOXTTTOO 


cpn m 1SJ°7060 

OJDVjJ JUL/ IN /UU27 


1 01 1 069 

1 27 1 1 VJOZ 


1 01 1 17£ 

1 27 1 1 J 1 O 


4064 1 


POTnJTTGO 


oJclv^ JLL/ IN /U027 


14RA919 
1 H-oOZ 1 Z 


1 toooo 1 


406 S 1 


POXTTTOO 


cpn m T\T°7060 

O C/VJJ JLL/ IN / KJVy 


i'tojyo / 


1 48681 9 
1 tooo 1 z 


4066 1 




cpo Tn TSJ°70^Q 
oJC/V^ JUL/ IN /U027 


1 48AR£H 


1 to /too 


4067 4 

tUO / .t 


PO>JTTG0 


cpn rr\ 1SJ°7060 


1 487696 
lH-O / OZO 


1 48871 7 
1 too ill 


41 1 
t 1.1 


PO>JTTO0 

1 1 VJ 27 


cpo Tn >J°7060 
ojj/Vjj iJ_j in / \j\jy 


460S0 

H"U27>2»U 


t / oJ> / 


41 1 


GOTsJTTGO 


OLy 1U IN / UU27 


48187 


S0160 

*70,J027 


4016 1 


GOISJTTOO 

vwlN 1 1VJ27 


cpo in >J°7060 

OXJ/V^ 1-L7 IN / UU27 


1 60S1 70 

1 UUJ 1 / \J 


1 60S47S 

I OU Jt / «j 


S04 1 


GOl\JTTO0 


cpn m 7\J°7060 

OI-zV^ i-L/ IN / W\J27 


471940 


4747^7 
t / 1 / .j / 


S06 1 

JvU. 1 


OOTsTTTGO 

V^V/IN 1 1 VJ 27 


QFO m TsJ°7060 


47S061 


4764S8 

t / UtJO 


S07 1 

D\J 1 . 1 


POTNJTTGO 


vpn m >J o 7060 


476460 


4771^0 


S08 1 

dkjo . 1 


POTnJTTGO 

V^v_71 N 1 1 VJ27 


9FO Tn TsJ°7060 

OJCfV^ 1J_7 1>I / \j\jy 


477S4S 

*T / / J*T*7 


477874 
t / / 0 / 1 


SI 8 1 

D 1 O. 1 


GOTsJTTGO 

V_^VJI> 1 1 VJ 27 


QFO Tn TsJ°7060 

OJ_/VJ^ li-/ 1>I /\7U27 


48S01 1 

T"0-7 27»2» 1 


487941 
to / zt 1 


S10 1 


OOT\JTTO0 

V/v7l> 1 1 VJ 27 


CFO Tn 1\J°7060 


4871 86 


487464 
to / tot 


S91 1 

_/Z. 1.1 


GOl\JTTO0 

V> V71 > 1 1 VJ27 


QFO m 1nT°7060 

OCy lJJ IN / \J\jy 


487S07 


487080 

tO / 27027 


660 1 

OU27. 1 


GOTsJTTGO 

VV/1>I 1 1VJ27 


CFO m 1\T°7060 
OJ-/VJJ iJ-j in / vjvjy 


6041 S4 


OU^>*J27»J 


6716 1 

U / JU. 1 


G01\JTT06 

V/W1N 1 1VJO 


cpf) m T\J°7066 

kjCv^ 1JJ IN /UUU 


1 894 


9^17 


6874 1 

OO / t. 1 


OOTsJTTG8 

JL 1 VJO 


CFO Tn >J°7068 

OJJ/V^ l-L/ IN /UOO 


S7S000 
0 / oyyy 


S7619S 


7S0 1 
/ .ju. i 


GOTsJTTGO 

V-'V^IN A 1 VJ27 


CFO ID TsJ°7060 


1 971 0^9 

1 Z / 1 27^Z 


1 974^1 0 

1 Z / 4J 1 27 


7S1 1 


OOT\TTTG0 

Vv/IN 1 1 VJ 27 


CFO m TsJ°7060 

OJDy 1JL7 IN / \J\jy 


1 971 177 
1Z / lj / / 


1 971 09^ 

1 Z / 1 OZj 


7S9 1 


GOTsJTTGO 

OVJi> ± 1VJ27 


CFO Tn 7\T°7060 

oXZ/V^ 1LJ IN / WU27 


1 970Q0S 

1Z / U27U-J 


1 971 9SS 

1 Z / 1 ADD 


760 1 


GOXTTTG0 

V^-V^/lM 1 1 VJ 27 


CFO Tn TsJ°7060 

O JL>V<^ 1-L7 IN / \J\jy 


1 766S74 


1 967910 

1 ZO /Z .J 27 


761 1 


OOl\JTTG0 

V/VJ1\ 1 1 VJ 27 


cpo Tn TnJ°7060 

1J_J IN /UU27 


1 966101 


1 966707 
1 zoo / u / 


764.1 


CONTIG9 


SEO ID N°7069 


1263381 

-1 J^V J Jul 


1263866 


765.1 


CONTIG9 


SEQ ID N°7069 


1261771 


1262709 


774.1 


CONTIG9 


SEQ ID N°7069 


1258317 


1260068 


778.1 


CONTIG9 


SEQ ID N°7069 


1256902 


1257267 


8067.2 


CONTIG9 


SEQ ID N°7069 


1488930 


1489379 


8073.2 


CONTIG9 


SEQ ID N°7069 


1484592 


1484990 


10294.1 


CONTIG8 


SEQ ID N°7068 


119556 


120962 


7817.1 


CONTIG8 


SEQ ID N°7068 


119556 


120962 


8134.1 


CONTIG8 


SEQ ID N°7068 


119556 


120962 
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874.1 


CONTIG9 


SEQ ID N°7069 


1177144 


1178064 


875.1 


CONTIG9 


SEQ ID N°7069 


1176375 


1176995 


876.1 


CONTIG9 


SEQ ID N°7069 


1175483 


1176361 


1769.1 


CONTIG3 


SEQ ID N°7063 


21 


716 


9388.1 


CONTIG3 


SEQ ID N°7063 


21 


716 


4934.1 


CONTIG9 


SEQ ID N°7069 


1819527 


1820453 


2798.2 


CONTIG8 


SEQ ID N°7068 


71193 


77576 


3630.3 


CONTIG12 


SEQ ID N°7072 


11054 


16705 



Table XII: Position of former contigs of the Paris strain on the genomic sequence of the 
chromosome of the Paris strain of sequence SEQ ID 3507 and of the plasmid of the 
Paris strain of sequence SEQ ID 3508 



chromosome 



plasmid 



posl 


pos2 


former contig 


1 


44600 


41 


44600 


161000 


56 


162000 


223000 


54 


r\ f*\ +\ /~v y-v S~\ 

223000 


232000 


39 


236000 


424000 


49 


437000 


a s~ r\ i°k j~\ s*\ 

469000 


37 




/DoUUU 


53 


758000 


781000 


45 


789000 


879000 


43 


883000 


901000 


36 


898000 


986000 


42 


990000 


1160000 


46 


1160000 


1214000 


39 


1214000 


1352000 


54 


1352000 


1670000 


52 


1670000 


1736000 


40 


1736000 


2040000 


55 


2044000 


2093000 


56 


2093000 


2204722 


45 


2204000 


2298000 


50 


2298000 


2656000 


56 


2656000 


2740000 


50 


2740000 


2753600 


33 


2753600 


2954000 


47 


2954000 


3178000 


51 


3178000 


3289000 


44 


3289000 


3449000 


48 


3449000 


3463000 


34 


3463000 


3503610 


41 


1 


131900 


55(positionl to 132400) 



SEQ ID of former contig 

SEQ ID N«41 
SEQIDNo<66 
SEQ ID Noc54 
SEQIDNocgQ 
SEQIDNcx49 
SEQIDNa87 
SEQIDNo53 
SEQ ID Na45 
SEQ ID Ncx43 
SEQ ID N«86 
SEQ ID Noo42 
SEQIDNa46 
SEQ ID Noe89 
SEQ ID N«54 
SEQ ID N«52 
SEQ ID Noq40 
SEQ ID Ncx65 
SEQIDN«66 
SEQIDNo45 
SEQ ID Noc60 
SEQ ID Noe66 
SEQIDNcxSO 
SEQIDNo83 
SEQIDNoc47 
SEQIDN«61 
SEQ ID N«44 
SEQ ID Noc48 
SEQ ID N«84 
SEQEDNo<41 

SEQIDNoe65 
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10 



15 



20 



25 



30 



35 



Table XIII : Correspondence of the numbers attributed to the chromosome and au 
plasmid of the Paris and Lens strain with the SEQ ID numbers identified in the list of 



Table XIV : Correspondence of the numbers attributed to the genes of the Paris strain on 
its chromosome of sequence SEQ ID 3507 and on its plasmid of sequence SEQ ID 3508 
with the numbers of the SEQ ID identified in the list of sequences and position of the 
nucleic sequences coding these genes on the sequence of the chromosome and of the 
plasmid with their putative function 

Table XV : Nature of the class listed in the "Class" column in Tables XIV and XVI 



1. 


Cellular envelope and cellular processes 


1.1 


Cellular wall and external membrane 


1.2 


Proteins of transport/bond and lipoproteins 


1.3 


Sensors (transduction of signal) 


1.4 


Bioenergy of membrane 


1.5 


Mobility and chimiotaxia 


1.6 


Secretion of protein 


1.7 


Cellular division 


1.8 


Structures of cellular surface and pili 


2. 


Intermediary metabolism 


2.1 


Metabolism of glucides and related molecules 


2.1.1 


Specific ways 


2.1.2 


Principal glycolytic ways 


2.1.3 


TCA cycle 


2.2 


Metabolism of aminoacids and related molecules 


2.3 


Metabolism of nucleotides and nucleic acids 


2.4 


Metabolism of lipids 


2.5 


Metabolism of coenzymes and prosthetic groups 


2.6 


Metabolism of phosphate 


3. 


Information paths 


3.1 


Replication of DNA 


3.2 


Repair and restriction/modification of DNA 



sequences 



SeqID=3507 
SeqID=3508 
SeqID=6733 
SeqID=6734 



chromosome of the Paris strain 
plasmid of the Paris strain 
chromosome of the Lens strain 
plasmid of the Lens strain 
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3.3 


Recombination of DNA 




3.4 


Segregation and encapsidation of DNA 




3.5 


Synthesis ofRNA 




3.5.1 


Initiation 


5 


3.5.2 


Regulation 




3.5.3 


Eloneation 




3.5.4 


Termination 




3.6 


Modification of RNA 




3.7 


Synthesis of protein 


10 


3.7.1 


Ribosomal proteins 




3.7.2 


Synthetases of aminoacyl-tRNA 




3.7.3 


Initiation 




3.7.4 


Elongation 




3.7.5 


Termination 


15 


3.8 


Modification of protein 




3.9 


Folding of protein 




4. 


Other functions 


20 


4.1 


Adaptation to atypical conditions 




4.2 


Detoxification 




4.3 


Toxins 




4.4 


Functions relating to phage 




4.5 


Transposon, IS, Plasmid 


25 


4.6 


Various 




5. 


Similar to unknown proteins 




5.1 


Of Legionella (similar though not the same) 


30 


5.2 


Of other organisms 




6. 


No similarity 



Similar to the enzyme IIN of the PTS XXXX-specific system 

3 5 Similar to the transcriptional regulator (family xx) 

Similar to the transportor ABC (protein of ATP bond) 
Similar to the transportor ABC (permease) 
Similar to the transportor ABC (bond protein) 
Similar to the response regulator with two compounds 

4 0 Similar to the histidine kinase sensor with two compounds 

Protein bound to putative peptidoglycane (LPXTG pattern) 

Table XVI : Correspondence of the numbers attributed to the specific genes of the Lens 
strain relative to the Paris and Philadelphia strains on its chromosome of sequence SEQ 
4 5 ID 6733 and on its plasmid of sequence SEQ ID 6734 with the numbers of SEQ ID 
identified in the list of sequences and position of the nucleic sequences coding these 
genes on the sequence of the chromosome and the plasmid with their putative function 
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Table XVII : List of the specific sequences of the Paris strain relative to the Lens and 
Philadelphia strains with their Pasteur Institute « ORE » correspondence number and 
accession number in the gene banks 



1056.1 SEQID=3544 EMBL_NAME=lppl800 

1067.1 SEQID=3551 EMBL_NAME=lppl877 

1069.2 SEQID=3552 EMBL_NAME=lppl878 

1076.3 SEQID=6591 EMBL_NAME=plpp0105 

1077.1 SEQID=6592 EMBL_NAME=plpp0104 

1078.2 SEQID=6593 EMBL_NAME=plpp0103 
1080.2 SEQID=3558 EMBL_NAME=lpp3012 

1081.2 SEQID=3559 EMBL_NAME=lpp301 1 
11.2 SEQID=3573 EMBL _NAME=lpp0196 
114.2 SEQID=3598 EMBL_NAME=lpp2957 
115.2 SEQID=3603 EMBL_NAME=lpp2956 
116.1 SEQED=3609 EMBL_NAME=lpp2955 

1160.3 SEQID=3610 EMBL_NAME=lpp0125 

1171.4 SEQID=6594 EMBL_NAME=plpp0017 
1172.2 SEQID=6595 EMBL_NAME=plpp0018 
118.1 SEQID=3618 EMBL_NAME=lpp2954 
1183.4 SEQID=3621 EMBL_NAME=lpp0356 

1213.2 SEQID=3638 EMBL_NAME=lpp0257 

1235.3 SEQID=6598 EMBL_NAME=plpp0121 

1237.2 SEQID=6600 EMBL_NAME=plpp01 19 

1299.3 SEQID=6601 EMBL_NAME=plpp0036 
13.1 SEQID=3688 EMBL_NAME=lpp0195 

1342.3 SEQID=6602 EMBL_NAME=plpp0034 

1344.4 SEQID=6603 EMBL_NAME=plpp0033 

1362.3 SEQID=6604 EMBL_NAME=plpp0098 
1364.2 SEQID=6605 EMBL_NAME=plpp0099 
1372.2 SEQID=3726 EMBL_NAME=lpp2385 

1373.1 SEQID=3727 EMBL_NAME=lpp2384 

1375.2 SEQID=3728 EMBL_NAME=lpp2383 
1376.2 SEQID=3729 EMBL_NAME=lpp2382 
1387.2 SEQID=3735 EMBL_NAME=lpp0079 
1388.2 SEQID=3736 EMBL_NAME=lpp0080 
139.6 SEQID=3737 EMBL_NAME=lppl 1 00 
1392.2 SEQID=3740 EMBL_NAME=lppl097 
1394.2 SEQID=3741 EMBL_NAME=lpp2557 

1429.4 SEQID=3766 EMBL_NAME=lpp2442 
1522.2 SEQID=6606 EMBL_NAME=plpp0127 

1523.2 SEQID=6607 EMBL_NAME=plpp0128 

1524.3 SEQID=6608 EMBL_NAME=plpp0129 

1566.3 SEQID=3846 EMBL_NAME=lpp2490 

1570.4 SEQID=3850 EMBL_NAME=lpp0077 

1599.5 SEQID=6610 EMBL_NAME=plpp0039 

1623.4 SEQID=3881 EMBL_NAME=lpp2394 

1624.5 SEQID=3882 EMBL_NAME=lpp2395 
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163.1 SEQID=3887 EMBL_NAME=lpp2344 
1655.2 SEQID=3905 EMBL_NAME=lppl895 

1683.2 SEQID=3923 EMBL_NAME=lpp0046 
172.1 SEQID=3946 EMBL_NAME=lpp2978 

1735.1 SEQID=3955 EMBL_NAME=lppl862 
1761.4 SEQID=6611 EMBL_NAME=plpp0074 

1779.3 SEQID=3985 EMBL_NAME=lpp2040 

1787.4 SEQID=3992 EMBL_NAME=lppl824 
18.1 SEQID=4000 EMBL_NAME=lpp0192 

1803.2 SEQID=4003 EMBL_NAME=lpp2374 
1815.2 SEQID=4010 EMBL_NAME=lpp2986 

1848.5 SEQID=6613 EMBL_NAME=plpp0124 
1849.4 SEQID=6614 EMBL_NAME=plpp0125 
1852.2 SEQID=6615 EMBL_NAME=plpp0126 

1891.2 SEQID=4056 EMBL_NAME=lpp3007 
19.1 SEQID=4061 EMBL_NAME=lpp0191 

1920.3 SEQID=4074 EMBL_NAME=lpp2405 
1923.2 SEQID=4075 EMBL_NAME=lpp2406 

1924.4 SEQID=4076 EMBL_NAME=lpp2407 
2.1 SEQID=4112 EMBL_NAME=lpp0163 
20.1 SEQID=4113 EMBL_NAME=lpp0190 
2006.2 SEQID=6616 EMBL_NAME=plpp0032 
2027.2 SEQID=4125 EMBL_NAME=lppl910 
2030.2 SEQID=4127 EMBL_NAME=lpp0241 
2051.2 SEQID=6617 EMBL_NAME=plpp0035 
2078.2 SEQID=4152 EMBL_NAME=lpp2456 

2112.2 SEQID=4167 EMBL_NAME=lpp3006 

2153.3 SEQID=4196 EMBL_NAME=lpp2427 
2172.2 SEQID=6619 EMBL_NAME=plpp0107 

2173.2 SEQID=6620 EMBL_NAME=plpp0106 

2174.1 SEQID=6621 EMBL_NAME=plpp0102 

2175.3 SEQID=6622 EMBL_NAME=plpp0101 

2176.3 SEQID=6623 EMBL_NAME=plpp0139 

2177.2 SEQED=6624 EMBL_NAME=plpp0140 
220.1 SEQID=4215 EMBL_NAME=lpp2859 
221.1 SEQID=4220 EMBL_NAME=lpp2860 

2212.4 SEQID=6626 EMBL_NAME=plpp0038 
222.1 SEQID=4223 EMBL_NAME=lpp2861 
223.1 SEQID=4227 EMBL_NAME=lpp2862 

2239.3 SEQID=4233 EMBL_NAME=lppl330 
2244.2 SEQID=6627 EMBL_NAME=plpp0052 
2245.2 SEQID=6628 EMBL_NAME=plpp0053 
2247.7 SEQID=6629 EMBL_NAME=plpp0054 
2258.2 SEQID=6630 EMBL_NAME=plpp0130 
2260.2 SEQID=6631 EMBL_NAME=plpp0131 
2270.2 SEQID=4248 EMBL_NAME=lpp0082 

2271.1 SEQID=4249 EMBL_NAME=lpp008 1 

2351.2 SEQH>=4293 EMBL_NAME=lpp3016 
2367.2 SEQID=4306 EMBL_NAME=lpp2386 
2374.2 SEQID=4312 EMBL_NAME=lpp2987 
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24.1 SEQID=4325 
2414.2 SEQID=4338 
2428.2 SEQED=4349 
2438.2 SEQID=4356 
2439.2 SEQID=4357 
2441.2 SEQID=4359 
2461.2 SEQID=4373 
2462.2 SEQID=4374 
2464.2 SEQID=4375 
2483.2 SEQID=4389 
2488.2 SEQID=4392 
2489.2 SEQID=4393 

2490.2 SEQID=4395 
25.1 SEQID=4399 
254.2 SEQID=6633 
255.1 SEQID=6634 
2555.5 SEQID=4432 

256.1 SEQID=6635 

2573.3 SEQID=6636 

258.2 SEQID=6637 
26.1 SEQID=4457 
2605.3 SEQID=4461 
2616.1 SEQID=4464 

2622.1 SEQID=4468 

2625.2 SEQID=4469 
2626.1 SEQID=4470 

2659.3 SEQID=6638 

2660.1 SEQID=6639 

2661.2 SEQID=6640 
2662.2 SEQID=6641 

2665.1 SEQID=6642 
2666; 1 SEQID=6643 

2667.2 SEQID=6644 
27.1 SEQID=4506 

2712.1 SEQID=4515 

2713.2 SEQID=4516 

2726.1 SEQID=4522 

2727.2 SEQID=4523 

2767.1 SEQID=4547 

2815.2 SEQID=4575 

2830.3 SEQID=4586 
2873.1 SEQID=6645 

2874.1 SEQID=6646 

2877.2 SEQID=6647 

2878.2 SEQID=6648 

2879.4 SEQID=6649 

2881.3 SEQID=6650 
29.1 SEQID=4611 
2919.1 SEQID=4624 
2932.3 SEQID=4633 



EMBL_NAME=lppO 1 89 

EMBL_NAME=lpp2879 

EMBL_NAME=lppl 870 

EMBL_NAME=lpp0074 

EMBL_NAME=lpp0073 

EMBL_NAME=lpp0072 

EMBL_NAME=lpp0769 

EMBLNAME=lpp0770 

EMBL_NAME=lpp077 1 

EMBL_NAME=lpp0052 

EMBL_NAME=lpp0336 

EMBL_NAME=lpp03 3 7 

EMBL_NAME=lpp0338 

EMBL_NAME=lppO 1 88 

EMBL_NAME=plpp0028 

EMBL_NAME=plpp0029 

EMBL_NAME=lpp2 1 47 

EMBL_NAME=plpp0030 

EMBL_NAME=plpp0094 

EMBL_NAME=plpp003 1 

EMBL_N AME=lpp0 1 87 

EMBL_NAME=lppO 1 98 

EMBL_NAME=lpp0201 

EMBL_N AME=lppO 197 

EMBL_NAME=lpp2376 

EMBL_NAME=lpp2375 

EMBL_NAME=plpp0022 

EMBL_NAME=plpp0023 

EMBL_NAME=plpp0024 

EMBL_NAME=plpp0025 

EMBL_NAME=plpp0040 

EMBL_NAME=plpp004 1 

EMBL_NAME=plpp0042 

EMBL_N AME=lppO 1 86 

EMBL_NAME=lpp2620 

EMBL_NAME=lpp262 1 

EMBL_NAME=lpp2636 

EMBL_NAME=lpp0256 

EMBL_N AME=lpp 1 561 

EMBL_NAME=lpp3070 

EMBL_NAME=lppl 928 

EMBL_NAME=plppO 1 09 

EMBL_NAME=plppO 1 08 

EMBL_NAME=plppO 1 00 

EMBL_NAME=plpp0097 

EMBL_NAME=plpp0096 

EMBL_NAME=plpp0095 

EMBL_NAME=lpp01 85 

EMBL_NAME=lppO 1 44 

EMBL_NAME=lpp2880 
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2951.2 SEQID=4643 EMBL_NAME=lpp2139 

2972.3 SEQID=4663 EMBL_NAME=lpp2122 
2973.2 SEQID=4664 EMBL_NAME=lpp2121 
2975.2 SEQID=4665 EMBL_NAME=lpp2120 
2976.1 SEQID=4666 EMBL_NAME=lpp2 119 
2978.1 SEQID=4667 EMBL_NAME=lpp2 118 
2982.1 SEQID=4668 EMBL_NAME=lpp2 117 
2983.1 SEQID=4669 EMBL_NAME=lpp2 116 
2985.1 SEQID=4670 EMBL_NAME=lpp2 1 1 5 
2986.1 SEQID=4671 EMBL_NAME=lpp21 14 
2987.1 SEQID=4672 EMBL_NAME=lpp2 1 1 3 
2988.1 SEQID=4673 EMBL_NAME=lpp21 12 

2990.1 SEQID=4674 EMBL_NAME=lpp21 1 1 
3.1 SEQID=4683 EMBL_NAME=lpp0162 
30.1 SEQID-4684 EMBL_NAME=lpp0184 

3005.2 SEQID=4689 EMBL_NAME=lpp2983 
3009.2 SEQID=4691 EMBL_NAME=lpp2968 
3011.2 SEQID=4692 EMBL_NAME=lpp0587 
3016.1 SEQID=4696 EMBL_NAME=lpp0583 
3017.1 SEQID=4697 EMBL_NAME=lpp0582 

3023.1 SEQID=4700 EMBL_NAME=lpp0576 

3030.2 SEQID=6651 EMBL_NAME=plpp0132 
3031.2 SEQID=6652 EMBL_NAME=plpp0133 
3035.2 SEQID=6653 EMBL_NAME=plpp0134 
3036.2 SEQID=6654 EMBL_NAME=plpp0138 

3101.1 SEQID=4743 EMBL_NAME=lpp0303 

3109.2 SEQID=4748 EMBL_NAME=lpp0297 
3110.1 SEQID=4749 EMBL NAME=lpp0296 

3113.1 SEQID=4750 EMBL_NAME=lpp0295 

3114.2 SEQID=4751 EMBL_NAME=lpp0294 
3116.2 SEQID=4752 EMBL_NAME=lpp0293 
3134.1 SEQID=6655 EMBL_NAME=plpp0072 
3135.1 SEQID=6656 EMBL_NAME=plpp0071 
3136.1 SEQID=6657 EMBL_NAME=pIpp0065 

3137.1 SEQID=6658 EMBL_NAME=plpp0064 

3138.2 SEQID=6659 EMBL_NAME=plpp0063 
3231.1 SEQID=4815 EMBL_NAME=lpp2319 
3232.1 SEQID=4816 EMBL_NAME=lpp23 1 8 

3233.1 SEQID=4817 EMBL_NAME=lpp23 1 7 

3234.3 SEQID=4818 EMBL_NAME=lpp2316 

3257.2 SEQID=4835 EMBL_NAME=lpp2409 

3258.2 SEQID=4836 EMBL_NAME=lpp2408 

3299.4 SEQID=4860 EMBL_NAME=lpp2 1 79 
3338.1 SEQID=4884 EMBL_NAME=lpp2484 
3343.1 SEQID=4887 EMBL_NAME=lpp2486 

3348.3 SEQID=6662 EMBL_NAME=plpp0078 
3349.3 SEQID=6663 EMBL_NAME=plpp0079 
3350.3 SEQID=6664 EMBL_NAME=plpp0080 
3351.3 SEQID=6665 EMBL_NAME=plpp008 1 
3352.1 SEQID=6666 EMBL_NAME=plpp0082 
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3353.1 SEQED=6667 

3354.1 SEQID=6668 

3355.2 SEQID=6669 
3356.2 SEQID=6670 
3357.2 SEQID=6671 

3394.1 SEQID=4910 
34.1 SEQID=4916 

3406.2 SEQID=4920 

3411.2 SEQID=4922 
3466.1 SEQID=4957 
3486.1 SEQID=4969 
3498.1 SEQID=6672 
3530.1 SEQED=4998 
3532.1 SEQID=4999 
3584.1 SEQID=5034 
3631.4 SEQID=6675 

3632.3 SEQID=6676 

3653.3 SEQID=5083 
3655.1 SEQID=5084 

3658.1 SEQID=5086 

3659.2 SEQID=5087 

3661.4 SEQID=5089 
3664.1 SEQID=5091 

3665.1 SEQID=5092 

3666.2 SEQID=5093 
3729.1 SEQID=5130 
3730.1 SEQID=5131 
3731.1 SEQID=5132 

3732.1 SEQID=5133 

3735.2 SEQID=5135 

3764.3 SEQID=5153 
3851.1 SEQID=5203 
3887.1 SEQID=5217 
3888.1 SEQID=5218 

3889.1 SEQID=5219 

3890.2 SEQID=5221 
3891.2 SEQID=5222 
399.2 SEQID=5282 
3993.2 SEQID=5284 
4.1 SEQID=5288 

400.1 SEQID=5289 

401.2 SEQID=5296 
4037.2 SEQID=5314 
4096.1 SEQID=5355 
4097.1 SEQID=5356 
4098.1 SEQID=5357 

4101.1 SEQID=5359 

4104.2 SEQID=5360 
4105.1 SEQID=5361 
4106.1 SEQID=5362 



EMBL_NAME=plpp0083 

EMBL_NAME=plpp0084 

EMBL_NAME=plpp0089 

EMBL_NAME=plpp0090 

EMBL_N AME=plpp009 1 

EMBL_N AME=lpp 1 04 1 

EMBL_NAME=lpp01 82 

EMBL_NAME=lpp 1 050 

EMBL_NAME=lpp2059 

EMBL_NAME=lpp2864 

EMBL_N AME=lpp 1 110 

EMBL_N AME=plpp00 1 9 

EMBL_N AME=lpp 1 823 

EMBL_N AME=lpp 1 822 

EMBL_NAME=lpp 1 098 

EMBL_NAME=plpp0058 

EMBL_NAME=plpp005 1 

EMBL_NAME=lpp2412 

EMBL_NAME=lpp241 3 

EMBL_NAME=lpp2420 

EMBL_NAME=lpp2424 

EMBL_NAME=lpp2425 

EMBL_NAME=lpp301 8 

EMBLN AME=lpp3 0 1 9 

EMBL_N AME=lpp3 020 

EMBL_NAME=lpp0204 

EMBL_NAME=lpp0205 

EMBL_NAME=lpp0206 

EMBL_NAME=lpp0207 

EMBL_NAME=lpp0209 

EMBL_NAME=lpp 1 868 

EMBL_NAME=lpp 1 492 

EMBL_NAME=lpp2439 

EMBL_NAME=lpp2438 

EMBL_NAME=lpp2437 

EMBL_NAME=lpp2436 

EMBL_NAME=lpp243 5 

EMBL_NAME=lppl 560 

EMBLN AME=lpp 1 640 

EMBL_NAME=lpp0161 

EMBLN AME=lpp 1 559 

EMBLN AME=lpp 1558 

EMBL_NAME=lpp0065 

EMBL_NAME=lpp0772 

EMBL_NAME=lpp0773 

EMBL_NAME=lpp0774 

EMBL_NAME=lpp0775 

EMBL_NAME=lpp2390 

EMBL_NAME=lpp2389 

EMBL_NAME=lpp23 88 
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4107.1 SEQID=5363 

4108.1 SEQID=5364 

4109.2 SEQID=5365 

4123.2 SEQID=5374 
4127.1 SEQID=5376 
4128.1 SEQID=5377 

4129.1 SEQID=5378 
413.5 SEQID=5379 

4130.3 SEQID-5380 

4144.3 SEQID=5393 

415.2 SEQED=5397 

4179.2 SEQID=5420 
4209.2 SEQID-5437 

4233.4 SEQ1D=5452 
4234.4 SEQID=5453 
4238.1 SEQ1D=5455 
4239.4 SEQID=5456 
4244.1 SEQID=5460 

4248.1 SEQID-5463 

425.1 SEQID=5465 

4251.2 SEQID=5466 

426.3 SEQID=5470 

4281.2 SEQID=5481 

4282.3 SEQID=5482 
4284.3 SEQID=5483 

4285.1 SEQID=5484 

4421.2 SEQID=5579 
4422.1 SEQID=5580 

4424.1 SEQID=5581 

4479.2 SEQID=5614 
4481.1 SEQID=6677 

4482.1 SEQID=6678 

4483.2 SEQID=6679 

4490.2 SEQID=5622 

4491.3 SEQID=5623 
4492.3 SEQID=5624 

4493.1 SEQID=5625 

4496.2 SEQID=5626 

4500.1 SEQID=5631 

4516.3 SEQID=5639 

4533.2 SEQID=5654 

4546.3 SEQID=5661 
4549.2 SEQID=6680 
4553.2 SEQID=5665 

4636.4 SEQID=5715 
4660.2 SEQID=5733 

474.2 SEQID=5787 
475.1 SEQID=5794 
4770.1 SEQID=5809 
4785.1 SEQID=5820 



EMBL_NAME=lpp2387 

EMBL_NAME=lpp238 1 

EMBL_NAME=lpp2380 

EMBL_N AME=lpp 1 077 

EMBL_NAME=lpp 1 075 

EMBL_N AME=lpp 1 074 

EMBL_NAME=lpp 1 073 

EMBL_NAME=lppl 905 

EMBL_N AME=lpp 1072 

EMBL_N AME=lpp04 1 2 

EMBL_NAME=lpp 1 906 

EMBL_N AME=lpp 1615 

EMBL_NAME=lppl 566 

EMBL_NAME=lpp2396 

EMBL_NAME=lpp2397 

EMBL_NAME=lpp2399 

EMBL_NAME=lpp2400 

EMBL_NAME=lppl407 

EMBL_NAME=lpp301 0 

EMBL_NAME=lpp2430 

EMBL_NAME=lpp3008 

EMBL_NAME=lpp2428 

EMBL_NAME=lpp2047 

EMBL_NAME=lpp2046 

EMBL_NAME=lpp2045 

EMBL_NAME=lpp2044 

EMBL_NAME=lpp021 1 

EMB L_N AME=lpp02 1 2 

EMBL_NAME=lpp021 3 

EMBL_N AME=lpp 1 573 

EMBL_NAME=plpp0050 

EMBL_NAME=plpp0049 

EMBL_NAME=plpp0048 

EMBL_NAME=lpp0068 

EMBL_NAME=lpp0069 

EMBL_NAME=lpp0070 

EMBL_N AME=lpp007 1 

EMBL_NAME=lpp0075 

EMBL_NAME=lpp2085 

EMBL_NAME=lpp0078 

EMBL_NAME=lpp0779 

EMBL_NAME=lpp2967 

EMBL_NAME=plpp01 1 8 

EMBL_NAME=lpp029 1 

EMBL_NAME=lpp2 1 68 

EMBL_NAME=lpp 1 956 

EMBL_NAME=lpp0777 

EMBL_NAME=lpp0778 

EMBL_NAME=lpp0439 

EMBL_NAME=lpp 1 625 
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4822.3 SEQID=5848 EMBL_NAME=lpp2393 
485.3 SEQID=6681 EMBL_NAME=plpp0026 

486.2 SEQID=6682 EMBL_NAME=plpp0027 

4860.1 SEQID=5865 EMBL_NAME=lpp0983 
489.1 SEQID=5886 EMBL_NAME=lpp0456 

4913.2 SEQID=5896 EMBL_NAME=lppl054 

4914.3 SEQID=5897 EMBL_NAME=lppl053 

4927.1 SEQID=5905 EMBL_NAME=lppl904 

4945.2 SEQID=6683 EMBL_NAME=plpp0003 
4947.2 SEQID=6684 EMBL_NAME=plpp0002 

502.3 SEQID=5956 EMBL_NAME=lppl 122 
5037.2 SEQID=5967 EMBL_NAME=lpp3077 
505.3 SEQID=5976 EMBL_NAME=lpp2418 
507.3 SEQID=5992 EMBL_NAME=lpp2416 
5078.2 SEQID=6685 EMBL_NAME=plpp0004 
5081.5 SEQID=6687 EMBL_NAME=plpp0005 

5088.2 SEQID=6000 EMBL_NAME=lppl576 

509.1 SEQID=6001 EMBL_NAME=lpp2415 

510.2 SEQID=6008 EMBL_NAME=lpp2414 

5113.3 SEQID=6689 EMBL_NAME=plpp0061 
5114.3 SEQID=6690 EMBL_NAME=plpp0062 
5146.2 SEQID=6022 EMBL_NAME=lpp0292 
5151.1 SEQID=6692 EMBL_NAME=plpp0123 
5152.1 SEQID=6693 EMBL_NAME=plpp0122 

5153.1 SEQID=6694 EMBL_NAME=plpp0093 

5154.2 SEQID=6695 EMBL_NAME=plpp0092 
52.1 SEQID=6035 EMBL_NAME=lpp0168 
5201.2 SEQID=6038 EMBL_NAME=lppl219 
5202.2 SEQID=6039 EMBL_NAME=lppl220 
5204.2 SEQID=6040 EMBL_NAME=lppl221 

5208.1 SEQID=6042 EMBL_NAME=lpp2984 

5224.2 SEQID=6045 EMBL_NAME=lppl929 

5242.3 SEQID=6054 EMBL_NAME=lpp2125 

5288.1 SEQID=6069 EMBL_NAME=lpp0588 

5297.2 SEQID=6073 EMBL_NAME=lpp2492 

5307.3 SEQID=6075 EMBL_NAME=lppl682 
5321.2 SEQID=6086 EMBL_NAME=lppl052 
5322.2 SEQID=6087 EMBL_NAME=lppl051 
5387.1 SEQID=6109 EMBL_NAME=lpp0083 

5390.1 SEQID=6112 EMBL_NAME=lpp0089 

5474.2 SEQID=6145 EMBL_NAME=lpp23 1 5 

5517.1 SEQID=6167 EMBL_NAME=lpp2391 

5554.3 SEQID=6187 EMBL_NAME=lppl216 
5555.3 SEQID=6188 EMBL_NAME=lppl217 
5576.3 SEQID=6197 EMBL_NAME=lpp2401 

5605.2 SEQID=6211 EMBL_NAME=lpp0076 

5621.1 SEQID=6219 EMBL_NAME=lpp2433 

5623.2 SEQID=6220 EMBL_NAME=lpp2434 

5626.3 SEQID=6221 EMBL_NAME=lpp2404 
5634.2 SEQID=6224 EMBL_NAME=lpp0432 
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5660.1 SEQED=6239 EMBL_NAME=lppl577 
567.6 SEQID=6696 EMBL_NAME=plpp0037 

5675.2 SEQID=6243 EMBL_NAME=lpp21 23 
5677.2 SEQID=6244 EMBL_NAME=lpp2124 
5688.2 SEQID=6249 EMBL_NAME=lpp2587 
577.2 SEQID=6270 EMBL_NAME=lpp0203 
5812.1 SEQID=6278 EMBL_NAME=lpp2377 
5813.1 SEQID=6279 EMBL_NAME=lpp2378 
5814.1 SEQID=6280 EMBL_NAME=lpp2379 
5822.1 SEQID=6282 EMBL_NAME=lpp2863 

5844.1 SEQID=6697 EMBL_NAME=plpp0047 

5846.2 SEQID=6698 EMBL_NAME=plpp0057 
5874.2 SEQID=6289 EMBL_NAME=lppl563 

5890.1 SEQID=6292 EMBL_NAME=lpp241 1 
59.1 SEQID=6298 EMBL_NAME=lpp0164 

591.4 SEQID=6304 EMBL_NAME=lpp2885 

5987.2 SEQID=6315 EMBL_NAME=lpp2403 
6015.1 SEQID=6699 EMBL_NAME=plpp0060 
6016.1 SEQID=6700 EMBL_NAME=plpp0059 
6029.1 SEQID=6321 EMBL_NAME=lpp0543 
6052.1 SEQID=6326 EMBL_NAME=lppl218 
6071.1 SEQID=6329 EMBL_NAME=lpp0290 

6072.1 SEQID=6330 EMBL_NAME=lpp0289 

6117.2 SEQID=6335 EMBL_NAME=lpp0202 

616.5 SEQID=6340 EMBL_NAME=lpp2884 

6165.1 SEQID=6342 EMBL_NAME=lpp2245a 

6217.2 SEQID=6352 EMBL_NAME=lppl564 

6224.3 SEQID=6701 EMBL_NAME=plpp0021 
6227.1 SEQID=6702 EMBL_NAME=plpp0008 
6229.1 SEQID=6703 EMBL_NAME=plpp0015 
6230.1 SEQID=6704 EMBL_NAME=plpp0016 
6231.3 SEQID=6705 EMBL_NAME=plpp0020 
6263.1 SEQID=6357 EMBL_NAME=lpp0298 
6273.1 SEQID=6359 EMBL_NAME=lpp3045 
6314.1 SEQID=6365 EMBL_NAME=lppl908 
6429.1 SEQID=6706 EMBL_NAME=plpp0141a 

659.2 SEQID=6707 EMBL_NAME=plpp0135 

660.1 SEQID=6708 EMBL_NAME=plpp0136 

661.3 SEQID=6709 EMBL_NAME=plpp0137 
678.3 SEQID=6398 EMBL_NAME=lppl081 

680.3 SEQID=6399 EMBL_NAME=lppl080 

681.4 SEQID=6400 EMBL_NAME=lppl079 
682.3 SEQID=6401 EMBL_NAME=lppl078 
7.1 SEQID=6412 EMBL_NAME=lpp0160 

713.2 SEQID=6710 EMBL_NAME=plpp0046 
715.1 SEQID=6711 EMBL_NAME=plpp0045 

716.1 SEQID=6712 EMBL_NAME=plpp0044 

717.3 SEQID=6713 EMBL_NAME=plpp0043 

737.2 SEQID=6714 EMBL_NAME=plpp0141b 

739.5 SEQID=6715 EMBL_NAME=plpp0001 
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750.2 SEQID-6437 EMBL_NAME=lpp2613 

752.2 SEQID=6438 EMBL_NAME=lpp2614 

761.2 SEQED=6716 EMBL_NAME=plpp01 10 

762.2 SEQID=6717 EMBL_NAME=plpp01 1 1 

764.1 SEQID=6718 EMBL_NAME=plpp01 12 

765.1 SEQID=6719 EMBL_NAME=plpp01 13 
766.4 SEQID=6720 EMBL_NAME=plpp01 14 

783.2 SEQID=6721 EMBL_NAME=plpp0055 
785.4 SEQID=6722 EMBL_NAME=plpp0056 

789.4 SEQED=6456 EMBL_NAME=lpp2245b 

796.2 SEQID=6723 EMBL_NAME=plpp0068 
862.1 SEQID=6496 EMBL_NAME=lpp2057 
863.1 SEQID=6497 EMBL_NAME=lpp2056 

875.5 SEQID=6505 EMBL_NAME=lpp2314 

876.1 SEQID=6506 EMBL_NAME=lpp23 1 3 

879.3 SEQID=6507 EMBL_NAME=lpp2312 
884.3 SEQID=6726 EMBL_NAME=plpp01 17 

885.3 SEQID=6727 EMBL_NAME=plpp01 16 

886.4 SEQID=6728 EMBL_NAME=plpp01 15 

905.2 SEQH>=6527 EMBL_NAME=lpp2473 

946.3 SEQID=6555 EMBL_NAME=lpp2423 
947.3 SEQID=6556 EMBL_NAME=lpp2422 
948.3 SEQID=6557 EMBL_NAME=lpp2421 
956.2 SEQID=6729 EMBL_NAME=plpp0088 
957.2 SEQID=6730 EMBL_NAME=plpp0087 
959.2 SEQID=6731 EMBL_NAME=plpp0086 
960.2 SEQID=6732 EMBL_NAME=plpp0085 
99.1 SEQID=6582 EMBL_NAME=lpp2365 



Table XVIII : List of the specific sequences of the Lens strain relative to the Paris and 
Philadelphia strains with their « ORF » Institut Pasteur correspondence number and 
accession number in the gene banks 



5 



1001.1 

1002.1 

1003.1 

102.1 

1020.1 

103.1 

1040.1 

1041.1 

1045.1 

1047.1 

1048.1 

1049.2 

105.1 

1050.1 

1059.1 



SEQID=6735 
SEQID=6736 
SEQID=6737 
SEQID=6738 
SEQID=6739 
SEQID=6740 
SEQID=6742 
SEQID=6743 
SEQID=6745 
SEQID=6746 
SEQID=6747 
SEQID=6748 
SEQID=6749 
SEQID=6750 
SEQID=6751 



EMBL_NAME=lpl 1 928 
EMBL_NAME=lpll 927 
EMBL_NAME=lpl 1 926 
EMBL_NAME=lpl0 1 83 
EMBL_NAME=lpll91 5 
EMBL_NAME=lpl0 1 82 
EMBL_N AME=lpl 1 900 
EMBL_NAME=lpll 899 
EMBL_NAME=lpl 1 896 
EMBL_NAME=lpll 895 
EMBL_NAME=lpl 1 894 
EMBL_NAME=lpl0064 
EMBL_NAME=lpl01 8 1 
EMBL_NAME=lpl0063 
EMBL_NAME=lpl0057 
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106.1 SEQID=6752 EMBL_NAME=lpl0180 

110.1 SEQID=6754 EMBL_NAME=lpl0177 

1105.2 SEQID=6755 EMBL_NAME=lpl0019 

111.1 SEQID=6756 EMBL_NAME=lpl01 76 

112.1 SEQID=6757 EMBL_NAME=lpl0175 

113.1 SEQID-6758 EMBL_NAME=lpl0174 

115.1 SEQID=6760 EMBL_NAME=lpl0173 

116.1 SEQID=6761 EMBL_NAME=lpl0172 

1204.1 SEQID=6763 EMBL_NAME=lpl2879 

1206.1 SEQID=6764 EMBL_NAME=lpl2877 

1207.1 SEQID=6765 EMBL_NAME=lpl2876 

1209.1 SEQID=6766 EMBL_NAME=lpl2875 

1217.1 SEQID=6767 EMBL_NAME=lpl2869 

1219.1 SEQID=6768 EMBL_NAME=lpl2868 

1221.1 SEQID=6769 EMBL_NAME=lpl2867 

1237.2 SEQID=6770 EMBL_NAME=lpl2857 
1251.1 SEQID=6773 EMBL_NAME=lpl2848 

1252.1 SEQID=6774 EMBL_NAME=lpl2847 

1255.2 SEQID=6775 EMBLNAME=lpl2845 
1258.1 SEQID=6776 EMBL_NAME=lpl2843 
126.1 SEQID=6777 EMBL_NAME=lpl0165 
127.1 SEQID=6778 EMBL_NAME=lpl0164 
1274.1 SEQID=6779 EMBL_NAME=lpl2842 
1275.1 SEQID=6780 EMBL_NAME=lpl2841 
1276.1 SEQID=6781 EMBL_NAME=lpl2840 
1278.1 SEQID=6782 EMBL_NAME=lpl2839 
1279.1 SEQID=6783 EMBL_NAME=lpl2838 
1280.1 SEQID=6784 EMBL_NAME=lpl2837 
1283.1 SEQID=6786 EMBL_NAME=lpl2835 
1284.1 SEQID=6787 EMBL_NAME=lpl2834 
1285.1 SEQID=6788 EMBL_NAME=lpl2833 
1296.1 SEQID=6789 EMBL_NAME=lpl2827 
1297.1 SEQID=6790 EMBL_NAME=lpl2826 
1321.1 SEQID=6791 EMBL_NAME=lpl2806 
1422.1 SEQID=6792 EMBL_NAME=lpl2741 
1535.1 SEQID=6795 EMBL_NAME=lpl0612 
154.1 SEQID=6796 EMBL_NAME=lpl0146 
156.1 SEQID=6797 EMBL_NAME=lpl0145 
157.1 SEQID=6798 EMBL_NAME=lpl0144 
159.1 SEQID=6799 EMBL_NAME=lpl0143 
1697.1 SEQID=6800 EMBL_NAME=lpl0718 
1718.1 SEQID=6801 EMBL_NAME=lpl0729 
1 824. 1 SEQID=6803 EMBL_NAME=lpl080 1 
1826.1 SEQID=6805 EMBL_NAME=lpl0803 
1827.1 SEQID=6806 EMBL_NAME=lpl0804 
1834.1 SEQID=6809 EMBL_NAME=lpl08 1 1 
1955.1 SEQID=6810 EMBL_NAME=lpl0904 
2110.1 SEQID=6812 EMBL_NAME=lpll015 
2137.1 SEQID=6816 EMBL_NAME=lpll037 
2141.1 SEQID=6818 EMBL_NAME=lpll041 
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2142.2 SEQID=6819 

2145.1 SEQID=6821 

2152.2 SEQID=6823 
2155.1 SEQID=6824 
2170.1 SEQID=6827 
2180.1 SEQID=6828 
2181.1 SEQID=6829 
2182.1 SEQID=6830 

2184.1 SEQID=6831 

2186.2 SEQID=6832 
2189.1 SEQID=6833 
2194.1 SEQID=6835 
2195.1 SEQID=6836 
2198.1 SEQID=6837 
2199.1 SEQID=6838 
2200.1 SEQID=6839 
2201.1 SEQID=6840 
2202.1 SEQID=6841 
2203.1 SEQID=6842 
2206.1 SEQID=6844 
2207.1 SEQID=6845 
2208.1 SEQID=6846 
2210.1 SEQID=6848 
2211.1 SEQID=6849 
2213.1 SEQID=6850 
2224.1 SEQID=6851 
2241.1 SEQID=6853 
2242.1 SEQID=6854 
2245.1 SEQID=6856 
2253.1 SEQID=6860 
2259.1 SEQID=6861 
2260.1 SEQID=6862 
2440.1 SEQID=6866 
2441.1 SEQID=6867 
2516.1 SEQID=6868 
2518.1 SEQID=6870 
2521.1 SEQID=6871 
2523.1 SEQID=6872 
2524.1 SEQID=6873 
2525.1 SEQID=6874 
2526.1 SEQID=6875 
2527.1 SEQID=6876 
2529.1 SEQID=6877 
2530.1 SEQID=6878 
2531.1 SEQID=6879 
2532.1 SEQID=6880 
2533.1 SEQID=6881 
2534.1 SEQID=6882 
2540.1 SEQID=6884 
2541.1 SEQID=6885 



EMBL_NAME=lpl 1 042 
EMBL_N AME=lpl 1 043 a 
EMBL_NAME=lpl 1 048 
EMBL_NAME=lpll050 
EMBL_NAME=lpl 1059 
EMBL_NAME=lpl 1 067 
EMB L_N AME=lpl 1 068 
EMBL_NAME=lpl 1 069 
EMBL_NAME=lpl 1 070 
EMBL_NAME=lpl 1 07 1 
EMBL_NAME=lpl 1 073 
EMBL_NAME=lpl 1 076 
EMBL_N AME=lpl 1 077 
EMBL_NAME=lpl 1 080 
EMBL_NAME=lpl 1081 
EMBL_NAME=lpl 1 082 
EMBL_NAME=lpl 1 083 
EMBL_N AME=lpl 1 084 
EMBL_NAME=lpl 1085 
EMBL_NAME=lpl 1 087 
EMBL_NAME=lpl 1 088 
EMBL_NAME=lpl 1089 
EMBL_NAME=lpl 1 09 1 
EMBL_NAME=lpl 1 092 
EMBL_N AME=lpl 1 093 
EMBL_NAME=lpll 101 
EMBL_NAME=lpl0199 
EMBL_NAME=lpl0200 
EMBL_NAME=lpl0202 
EMBL_NAME=lpl0207 
EMBL_NAME=lpl021 1 
EMBL_N AME=lpl02 1 2 
EMBL_NAME=lpl2546 
EMBL_NAME=lpl2545 
EMBL_NAME=lpl2497 
EMBL_NAME=lpl2495 
EMBL_NAME=lpl2494 
EMBL_NAME=lpl2493 
EMBL_NAME=lpl2492 
EMBL_N AME=lpl249 1 
EMBL_NAME=lpl2490 
EMBL_NAME=lpl2489 
EMBL_NAME=lpl2488 
EMBL_NAME=lpl2487 
EMBL_NAME=lpl2486 
EMBL_NAME=lpl2485 
EMBL_NAME=lpl2484 
EMBL_NAME=lpl2483 
EMBL_NAME=lpl2477 
EMBL_NAME=lpl2476 
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2547.1 SEQID=6886 EMBL_NAME=lpl2472 

2584.1 SEQID=6888 EMBL_NAME=lpl2445 

2640.1 SEQID=6890 EMBL_NAME=lpl2399 

2658.1 SEQID=6891 EMBL_NAME=lpl2385 

266.1 SEQID=6892 EMBL_NAME=lpl0069 

267.1 SEQID=6894 EMBL_NAME=lpl0068 

269.1 SEQID=6895 EMBL_NAME=lpl0067 

270.1 SEQID=6897 EMBL_NAME=lpl0066 

2701.1 SEQID=6898 EMBL_NAME=lpl2354 

2708.1 SEQID=6900 EMBL_NAME=lpl2350 

2717.1 SEQID=6901 EMBL_NAME=lpl2344 

2719.1 SEQID=6902 EMBL_NAME=lpl2343 

272.1 SEQID=6903 EMBL_NAME=lpl0065 
2720.1 SEQID=6904 EMBL_NAME=lpl2342 
2722.1 SEQID=6905 EMBL_NAME=lpl2341 
2723.1 SEQID=6906 EMBL_NAME=lpl2340 

273.2 SEQID=6907 EMBL_NAME=lpll893 
2738.1 SEQID=6908 EMBL_NAME=lpl2330 
2749.1 SEQE)=6909 EMBL_NAME=lpl2323 
2775.1 SEQID=6910 EMBL_NAME=lpl2309 
2782.1 SEQID=6912 EMBL_NAME=lpl2305 
2795.1 SEQID=6913 EMBL_NAME=lpl2295 
2796.1 SEQID=6914 EMBL_NAME=lpl2294 
2798.1 SEQID=6915 EMBL_NAME=lpl2293 
2801.1 SEQID=6916 EMBL_NAME=lpl2292 
2802.1 SEQID=6917 EMBL_NAME=lpl2291 
2805.1 SEQID=6918 EMBL_NAME=lpl2289 
2806.1 SEQID=6919 EMBL_NAME=lpl2288 
2808.1 SEQID=6921 EMBL_NAME=lpl2286 
2810.1 SEQID=6923 EMBL_NAME=lpl2284 
3002.1 SEQID=6925 EMBL_NAME=lpl2148 
3042.1 SEQID=6926 EMBL_NAME=lpl2 114 
3054.1 SEQID=6927 EMBL_NAME=lpl2107 
3056.1 SEQID=6928 EMBL_NAME=lpl2106 
3057.1 SEQID=6929 EMBL_NAME=lpl2105 
3064.1 SEQID=6930 EMBL_NAME=lpl2 1 00 
3065.1 SEQID=6931 EMBL_NAME=lpl2099 
3066.1 SEQID=6932 EMBL_NAME=lpl2098 
3140.1 SEQID=6933 EMBL_NAME=lpl2049 
3155.1 SEQID=6935 EMBL_NAME=lpl2038 
3156.1 SEQID=6936 EMBL_NAME=lpl2037 
3158.1 SEQID=6937 EMBL_NAME=lpl2036 
3160.1 SEQID=6938 EMBL_NAME=lpl2035 
3161.1 SEQID=6939 EMBL_NAME=lpl2034 
3162.1 SEQID=6940 EMBL_NAME=lpl2033 
3417.1 SEQID=6942 EMBL_NAME=lpl0216 
3420.1 SEQID=6943 EMBL_NAME=lpl02 1 7 
3422.1 SEQID=6944 EMBL_NAME=lpl02 1 8 
3435.1 SEQID=6945 EMBL_NAME=lpl0226 
3728.1 SEQID=6948 EMBL_NAME=lpl0552 
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3743.2 

3744.1 

3745.1 

3747.2 

3748.2 

3815.2 

3818.1 

3820.1 

3822.1 

3823.1 

3843.1 

3848.2 

3850.1 

3863.1 

3865.1 

3866.1 

3868.1 

3870.1 

3871.1 

3873.1 

3874.1 

3875.1 

3877.1 

3878.1 

3880.1 

3881.1 

3882.1 

3884.1 

3886.2 

3888.1 

3890.1 

3891.2 

3893.1 

3894.1 

3949.1 

3976.1 

3987.1 

4007.1 

4172.1 

4173.1 

4175.1 

4181.1 

4182.1 

4185.1 

4189.2 

4196.1 

4197.1 

4248.1 

4250.1 

4251.1 



SEQID=6949 

SEQID=6950 

SEQID=6951 

SEQID=6952 

SEQID=6953 

SEQID=6957 

SEQID=6959 

SEQID=6960 

SEQID=6961 

SEQID=6962 

SEQID=6972 

SEQID=6973 

SEQID=6974 

SEQID=6978 

SEQID=6979 

SEQID=6980 

SEQID=6981 

SEQID=6982 

SEQID=6983 

SEQID=6984 

SEQID=6985 

SEQID=6986 

SEQID=6987 

SEQID=6988 

SEQID=6989 

SEQID=6990 

SEQID=6991 

SEQID=6992 

SEQID=6993 

SEQID=6994 

SEQID=6995 

SEQID=6996 

SEQID=6997 

SEQID=6998 

SEQID=7002 

SEQID=7003 

SEQID=7004 

SEQID=7006 

SEQID=7014 

SEQID=7015 

SEQID=7017 

SEQID=7019 

SEQID=7020 

SEQID=7021 

SEQID=7023 

SEQID=7024 

SEQID=7025 

SEQID=7030 

SEQID=7031 

SEQID=7032 



EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL" 

embl" 

EMBL_ 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL" 
EMBL^ 
EMBL_ 
EMBL_ 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL" 
EMBL^ 

EMBL_ 

EMBL 

EMBL 

EMBL 

EMBL 

EMBL 

EMBL 

EMBL" 

EMBL" 

EMBL" 

EMBL" 

EMBL 

EMBL 

EMBL 

EMBL 



_NAME=lpl0565 
_NAME=lpl0566 
NAME=lpl0567 
NAME=lpl0568a 
NAME=lpl0568b 
NAME=plpl0037 
_NAME=plpl0039 
_NAME=plpl0040 
_NAME=plpl0042 
_NAME=plpl0043 
_NAME=plpl0053 
_NAME=plplO001 
_NAME=plpl0002 
_NAME=plpl0014 
NAME=plplO015 
NAME=plpl0016 
_NAME=plpl0017 
NAME=plpl0018 
NAME=plpl0019 
_NAME=plpl0020 
_NAME=plpl0021 
_NAME=plpl0022 
_NAME=plpl0023 
_NAME=plpl0024 
_NAME=plpl0025 
NAME=plpl0026 
NAME=plpl0027 
_NAME=plpl0028 
_NAME=plpl0029 
_NAME=plpl0030 
_NAME=plpl003 1 
_NAME=plpl0032 
_NAME==plpl0033 
_NAME=plpl0034 
_NAME=lplll58 
_NAME=lplll38 
NAME=lplll32 
NAME=lpllll6 
_NAME=lpl0194 
_NAME=lpl0193 
_NAME=lpl0191 
_NAME=lpl0189 
_NAME=lpl0188 
_NAME=lpl0187 
_NAME=lpll424 
_NAME=lpll417 
NAME=lpll416 
_NAME=lpll942 
NAME=lpll943 
NAME=lpll944 
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4252.1 SEQID=7033 

4253.1 SEQID=7034 

4254.1 SEQID=7035 

560.1 SEQID=7037 

561.1 SEQID=7038 

562.1 SEQID=7039 

564.1 SEQID=7040 

688.1 SEQID=7042 

689.1 SEQID=7043 

692.1 SEQID=7045 

699.1 SEQID=7048 

700.1 SEQID=7049 

703.1 SEQID=7050 

709.1 SEQID=7053 

711.1 SEQID=7054 

760.1 SEQID=7055 

761.1 SEQID=7056 

898.1 SEQID=7057 

995.2 SEQID=7059 
997.1 SEQID=7060 
2144.1 SEQID=6820 



EMBL_NAME=lpl 1 945 
EMBL_NAME=lpll 946 
EMBL_NAME=lpl 1 947 
EMBL_NAME=lpl 1681 
EMBL_NAME=lpl 1 680 
EMBL_NAME=lpl 1 679 
EMBL_NAME=lpl 1678 
EMBL_NAME=lpll 588 
EMBL_NAME=lpl 1587 
EMBL_NAME=lpll 585 
EMBL_N AME-lpl 1581 
EMBL_NAME=lpl 1 580 
EMBL_NAME=lpl 1 579 
EMBL_N AME=lpl 1 5 75 
EMBL_NAME=lpl 1 574 
EMBL_NAME=lpl 1537 
EMBL_NAME=lpl 1 536 
EMBL_NAME=lpl 1 425 
EMBL_NAME=lpl 1 933 
EMBL_NAME=lpl 1 93 1 
EMBL_NAME=lpl 1 043b 
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Table XIX : List of the sequences present in the Paris and Lens strain though absent in 
the Philadelphia strain with their Pasteur Institute « ORF » correspondence number and 
accession number in the gene banks 



1082.3 


SEOED=3560 


1090.1 


SEOID=3565 


1 156 3 


SEOID=3606 


119 1 

1 1 Z7 . 1 


QFOTD=3625 


121.1 


SEOTD=3635 


1225.2 


SEOII>=6596 


1226 2 


SFOTD=6597 


131 2 


SEOTD=3694 


1469 4 


SEOTD=3786 


15 1 

L *J ■ A 


SFOTD=3803 


1 560 2 


SFOTD=3842 


1 6 1 


SFOTD=3869 


1737 2 


SFOTD=3956 


1 875 2 


QFOrT)=4045 


2 1 


SFOTD=41 12 


2026 1 


SEOTD=4124 


2039 1 


SFOTD=41 31 


2275 4 


SFOID=6632 


2357 4 


SEOID=4299 


2427 4 


SFOTD=4348 


2453 4 
z*~ ^ _j . i 


SEOID=4367 


2649 1 

Z*U~_7 . A 


SEOTD=4483 


321 3 


SFOTD=4804 


3248 1 


SEOID=4829 


33 1 


SFOID=4861 


3395 3 


SFOTD=491 1 

A-/V^ AJ_-/ i J7 A A 


3396 1 

J J7U, A 


SFOTD=4912 


3401 2 


SEOTD=4917 

kJ J— * V^/ AA-X A / 


341 6 


SFOTD=4921 


3413.2 


SEOID=4923 


3414.3 


SEQID=4924 


3499.1 


SEQID=6673 


3500.3 


SEQID=6674 


3563.3 


SEQID=5023 


3594.1 


SEQID=5041 


3600.2 


SEQID=5048 


3601.1 


SEQID=5049 


3657.1 


SEQID=5085 


3734.1 


SEQID=5134 


3744.2 


SEQID=5142 


3763.1 


SEQID=5152 


3871.1 


SEQID=5211 


3872.1 


SEQID=5212 


3878.1 


SEQID=5215 



EMBL_NAME=lppl 844 
EMBL_NAME=lpp 1 06 1 
EMBL_NAME=lppl 106 
EMBL_NAME=lpp2953 
EMBL_NAME=lpp2952 
EMBL_NAME=plpp001 3 
EMBL_NAME=plpp001 2 
EMBL_N AME=lpp 1 099 
EMBL_NAME=lppl 843 
EMB L_NAME=lpp0 1 94 
EMBL_NAME=lpp2477 
EMBL_NAME=lpp0193 
EMBL _NAME=lpp 1 863 
EMBL_NAME=lpp2529 
EMBL_NAME=lppO 1 63 
EMBL_NAME=lppl 909 
EMBL_NAME=lpp0243 
EMBL_NAME=plpp00 1 4 
EMBL_NAME=lpp2054 
EMBL_NAME=lppl 869 
EMBL_NAME=lpp2450 
EMBL_NAME=lpp0667 
EMBL_NAME=lpp298 1 
EMBL_NAME=lpp2478 
EMBL_NAME=lpp01 83 
EMBL_N AME=lpp 1 042 
EMBL_NAME=lppl 043 
EMBLN AME=lpp 1 047 
EMBL_NAME=lpp0024 
EMBL_NAME=lpp2060 
EMBL_NAME=lpp206 1 
EMBL_NAME=plpp001 1 
EMBL_NAME=plpp00 1 0 
EMBLN AME=lpp0 158 
EMBL_NAME=lpp0639 
EMBL_NAME=lpp2449 
EMBL_NAME=lpp2448 
EMBL_N AME=lpp24 1 9 
EMBL_NAME=lpp0208 
EMBL_NAME=lppl 850 
EMBL_NAME=lppl 867 
EMBL_NAME=lpp2070 
EMBL_NAME=lpp2069 
EMBL_NAME=lpp2066 
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4039.2 


SEQID= 


=5315 


4040.1 


SEQID= 


=5316 


4045.2 


SEQID= 


=5320 


417.3 


SEQH> 


=5412 


4276.2 


SEQID= 


=5479 


4532.2 


SEQID= 


=5653 


4763.2 


SEQH> 


=5803 


4764.2 


SEQID= 


=5804 


5056 3 


SEQH> 


=5980 


5058.2 


SEQID= 


=5981 


5059.3 


SEQID= 


=5982 


506.3 


SEQID= 


=5983 


5080 6 


SEQID= 


=6686 


5087.2 


SEQID= 


=5999 


5106.3 


SEQID= 


=6688 


5147.1 


SEQID= 


=6691 


5176.1 


SEQID= 


=6026 


5382 1 


SEQEL> 


=6107 


5388.1 


SEQID= 


=6110 


5404.2 


SEQID= 


=6117 


5504.4 


SEQID= 


=6163 


553.1 


SEQD> 


=6173 


5584.2 


SEQH> 


=6200 


5609.1 


SEQH> 


=6214 


58.1 


SEQH> 


=6273 


6036.1 


SEQID= 


=6322 


650.4 


SEQID= 


=6385 


651.3 


SEQID= 


=6386 


860.2 


SEQID= 


=6495 


9.2 


SEQID= 


=6521 



EMBL_NAME=lpp0064 
EMBL_NAME=lpp0063 
EMBL_NAME=lpp0059 
EMBLN AME=lpp 1907 
EMBL_NAME=lpp2048 
EMBL_NAME=lpp2049 
EMBLN AME=lpp 1088 
EMBL_NAME=lppl 087 
EMBL_NAME=lppl 578 
EMBL_NAME=lpp 1 579 
EMBLN AME=lpp 1580 
EMBL_NAME=lpp241 7 
EMBL_NAME=plpp0006 
EMBL_NAME=lpp201 6 
EMBL_NAME=plpp0007 
EMBL_NAME=plpp0009 
EMBL_NAME=lpp07 1 2 
EMBL_N AME=lpp 1086 
EMBL_NAME=lpp0084 
EMBL_NAME=lpp2443 
EMBL_NAME=lpp2053 
EMBL_NAME=lpp2920 
EMBL_NAME=lpp 1 450 
EMBL_NAME=lpp2 1 53 
EMBL_NAME=lppO 165 
EMBL_NAME=lppl449 
EMBL_NAME=lpp0668 
EMBL_NAME=lpp0669 
EMBL_NAME=lpp2058 
EMBLN AME=lppO 1 59 



Table XX : List of the sequences present in the Paris and Philadelphia strain though 
absent in the Lens strain with their Pasteur Institute « ORF » correspondence number 
and accession number in the gene banks 



102.1 


SEQID= 


=3519 


EMBL NAME=lpp2364 


103.1 


SEQID= 


=3525 


EMBL NAME=lpp2363 


104.1 


SEQID= 


=3533 


EMBL NAME=lpp2362 


107.1 


SEQID= 


=3553 


EMBL NAME=lpp2361 


109.1 


SEQID= 


=3564 


EMBL NAME=lpp2360 


1107.2 


SEQID= 


=3578 


EMBL NAME=lppl601 


1109.3 


SEQID= 


=3579 


EMBL NAME=lppl600 


111.2 


SEQID= 


=3580 


EMBL NAME=lpp2359 


1111.3 


SEQID= 


=3581 


EMBL NAME=lppl599 


1211.3 


SEQID= 


=3637 


EMBL NAME=lpp0258 


1236.2 


SEQID= 


=6599 


EMBL NAME=plpp0120 


1334.3 


SEQID= 


=3707 


EMBL NAME=lpp0094 


1335.2 


SEQID= 


=3708 


EMBL NAME=lpp0095 
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1423.2 

1424.2 

1425.2 

1432.3 

152.3 

1548.2 

1549.1 

1550.2 

158.2 

159.1 

160.1 

1628.3 

1629.1 

1631.3 

1635.4 

1639.4 

168.1 

1682.4 

169.1 

1703.1 

1775.3 

1847.3 

1887.2 

190.3 

1960.2 

1961.3 

2031.2 

2054.2 

2079.2 

2143.5 

2169.2 

227.2 

228.1 

2544.3 

2591.3 

2637.1 

2639.1 

2646.1 

2730.1 

2808.1 

2849.1 

2938.4 

2991.1 

2992.2 

3163.1 

3190.1 

3191.1 

3205.3 

3207.5 

3250.1 



SEQID=3763 

SEQID=3764 

SEQID=3765 

SEQID=3767 

SEQID=3818 

SEQID=3834 

SEQID=3835 

SEQID=3836 

SEQID=3854 

SEQID=3862 

SEQID=3870 

SEQID=3885 

SEQID=3886 

SEQID=3888 

SEQID=3891 

SEQID=3894 

SEQID=3920 

SEQID=3922 

SEQID=3927 

SEQID=3936 

SEQID=3983 

SEQID=4029 

SEQID=4053 

SEQID=4062 

SEQID=4096 

SEQID=4097 

SEQID=4128 

SEQID=4138 

SEQID=4153 

SEQID=4186 

SEQID=4205 

SEQID=4247 

SEQID=4253 

SEQID=4425 

SEQID=4451 

SEQID=4474 

SEQID=4475 

SEQID=4480 

SEQID=4526 

SEQID=4572 

SEQID=4596 

SEQID=4636 

SEQID=4675 

SEQID=4676 

SEQID=4778 

SEQID=4795 

SEQID=4796 

SEQID=4802 

SEQID=4803 

SEQID=4832 



EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL" 
EMBL_ 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL" 
EMBL] 
EMBL_ 
EMBL_ 
EMBL_ 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL] 
EMBL_ 
EMBL_ 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL" 
EMBL] 
EMBL_ 
EMBL_ 
EMBL_ 
EMBL 
EMBL 
EMBL 
EMBL 



NAME 
NAME 
NAME; 
_NAME 
_NAME 
_NAME 
.NAME 
_NAME 
_NAME : 
NAME= 
NAME= 
NAME= 
NAME= 
_NAME= 
_NAME= 
_NAME= 
_NAME= 
_NAME= 
_NAME= 
_NAME= 
_NAME 
_NAME 
_NAMB 
NAME : 
NAME 
NAME= 
NAME= 
_NAME= 
_NAME= 
_NAME= 
_NAME= 
_NAME= 
_NAME= 
NAME= 
_NAME= 
NAME= 
NAME= 
NAME: 
_NAME : 
_NAME : 
_NAME= 
_NAME= 
_NAME= 
_NAME= 
NAME= 
_NAME= 
NAME= 
NAME= 
NAME= 
NAME= 



:=lpp2328 
=lpp2329 
=lpp2330 
=lpp2441 
=lpp2354 
=lpp2331 
=lpp2332 
=lpp2333 
=lpp2341 
=lpp2342 
=lpp2343 
=lpp2339 
=lpp2338 
=lpp2337 
=lppl309 
=lppl308 
=lpp2346 
=lpp0045 
=lpp2347 
=lppl942 
=lppll30 
=lppl089 
=lppl940 
=lpp0234 
=lpp2340 
=lpp2355 
=lpp0330 
=lpp2603 
=lpp2455 
=lpp2615 
=lppl890 
=lpp2358 
=lpp2357 
=lpp2192 
=lpp2779 
=lpp2336 
=lpp2327 
=lpp0673 
=lpp0251 
=lpp0321 
=lppl912 
=lpp2883 
=lpp2110 
; lpp2109 
lpp0331 
lppl007 
4ppl006 
=lpp2131 
lpp2132 
4pp2474 
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3508.2 SEQID=4982 

3588.2 SEQID=5036 
3663.1 SEQID=5090 

3699.3 SEQID=5114 
3701.1 SEQID=5116 

371.1 SEQID=5123 

3783.1 SEQID=5160 

3822.2 SEQID=5187 
384.5 SEQID=5197 

3884.2 SEQID=5216 

402.2 SEQID=5304 

4030.1 SEQID=5310 

4083.4 SEQID=5348 

4084.3 SEQID=5349 

4220.2 SEQID=5443 

423.2 SEQID=5449 
424.1 SEQID=5457 

4242.3 SEQID=5458 

4249.1 SEQID=5464 

4453.2 SEQID=5597 
4517.2 SEQID=5640 
4528.2 SEQID=5649 
4530.1 SEQID=5651 

4559.1 SEQID=5667 

4591.2 SEQID=5689 
4819.2 SEQID=5845 
4965.2 SEQID=5926 
4966.2 SEQID=5927 
5060.2 SEQID=5984 
5289.2 SEQID=6070 
5328.1 SEQID=6088 
5337.1 SEQID=6090 

5340.1 SEQID=6091 

538.1 SEQID=6105 

5496.2 SEQID=6157 
5656.2 SEQID=6238 

5723.2 SEQID=6261 

5871.4 SEQID=6288 

590.4 SEQID=6299 

592.3 SEQID=6305 

5920.3 SEQID=6306 

593.2 SEQID=6308 
594.1 SEQID=6309 

5999.1 SEQID=6317 

6002.2 SEQID=6318 
6110.1 SEQID=6334 

615.5 SEQID=6337 
6151.1 SEQID=6338 
6159.1 SEQID=6339 
6160.1 SEQID=6341 



EMBL_NAME=lpp2780 
EMBL _NAME=lppl 090 
EMBL_NAME=lpp301 7 
EMBL_NAME=lppl404 
EMBL_N AME=lpp 1 403 
EMBL_NAME=lpp2039 
EMBL_NAME=lppl 144 
EMBL_NAME=lpp085 1 
EMBL_N AME=lpp07 1 6 
EMBL_NAME=lpp2440 
EMBL_NAME=lpp 1557 
EMBL_NAME=lpp0096 
EMBL N AME=lpp 1 603 
EMBL_NAME=lpp 1 602 
EMBL_NAME=lppl 852 
EMBL_NAME=lpp2432 
EMBL_NAME=lpp243 1 
EMBL_NAME=lpp 1 405 
EMBL_NAME=lpp3009 
EMBL_NAME=lpp 1 447 
EMBL_NAME=lpp2497 
EMBL_NAME=lpp2052 
EMBL_NAME=lpp205 1 
EMBL_NAME=lpp0287 
EMBL_NAME=lpp2508 
EMBL_NAME=lpp2498 
EMBL_NAME=lpp0829c 
EMBL_NAME=lpp0830 
EMBLN AME=lpp083 5 
EMBL_NAME=lpp0589 
EMBL_NAME=lpp 1 565 
EMBL_NAME=lpp2896 
EMBL_N AME=lpp 1 947 
EMBL_NAME=lpp0859 
EMBL_NAME=lpp0829b 
EMBL_NAME=lpp2887 
EMBL_N AME=lpp 1 944 
EMBL_NAME=lpp2037 
EMBL_NAME=lpp2886 
EMBL_NAME=lpp2348 
EMBL_NAME=lpp23 1 1 
EMBL_NAME=lpp2349 
EMBL_N AME=lpp23 50 
EMBL_N AME=lpp003 9 
EMBL_NAME=lpp088 1 
EMBL_NAME=lpp0829a 
EMBL_N AME=lpp 1 562 
EMBL_NAME=lpp0038 
EMBL_NAME=lpp24 1 0 
EMBL_NAME=lpp2402 
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6178.1 


SEQID=6343 


6180.1 


SEQID=6345 


6186.1 


SEQK)=6346 


6195.2 


SEQID=6350 


6285.1 


SEQID=6362 


6309.1 


SEQID=6364 


6318.1 


SEQID=6366 


6320.1 


SEQID=6368 


6322.1 


SEQID=6369 


743.4 


SEQID=6432 


744.4 


SEQID=6433 


818.2 


SEQID=6469 


819.2 


SEQID=6470 


864.1 


SEQID=6498 


901.2 


SEQID=6525 


938.3 


SEQID=6550 


96.6 


SEQID=6562 


97.2 


SEQID=6568 


979.3 


SEQID=6574 


980.1 


SEQID=6575 


981.2 


SEQID=6576 



EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL~ 

embl" 

EMBL 

EMBL_ 

EMBL_ 

EMBL_ 

EMBL_ 

EMBL_ 

EMBL 



_NAME 
_NAME 
_NAME 
_NAME 
_NAME : 
_NAME : 
_NAME 
_NAME 
NAME 
NAME 
NAME= 
NAME= 
NAME- 
NAME= 
NAME= 
NAME- 
NAME= 
NAME= 
NAME= 
NAME= 
NAME= 



=lpp0210 
=lppl035 
=lpp0882 
=lpp0717 
=lpp2895 
=lppl945 
=lppl851 
=lppl813 
=lppl812 
=lpp2368 
=lpp2369 
=lpp2335 
=lpp2334 
=lpp2055 
; lpp2471 
=lppl253 
lpp2367 
lpp2366 
lpp2494 
lpp2495 
lpp2496 



Table XXI : List of the sequences present in the Philadelphia and Lens strain though 
absent in the Paris strain with their Pasteur Institute « ORF » correspondence number 
and accession number in the gene banks 



1038.1 


SEQID=6741 


1043.1 


SEQID=6744 


1073.1 


SEQID=6753 


1130.1 


SEQID=6759 


117.1 


SEQID=6762 


124.1 


SEQID=6771 


125.1 


SEQID=6772 


1282.1 


SEQID=6785 


1434.1 


SEQID=6793 


148.1 


SEQID=6794 


175.1 


SEQID=6802 


1825.1 


SEQID=6804 


1828.1 


SEQID=6807 


1829.1 


SEQID=6808 


2100.1 


SEQID=6811 


2134.1 


SEQID=6813 


2135.1 


SEQID=6814 


2136.1 


SEQID=6815 


2139.1 


SEQID=6817 


2151.1 


SEQID=6822 


2157.1 


SEQID=6825 


2168.1 


SEQID=6826 



EMBL 
EMBL 
EMBL 
EMBL 
EMBL" 
EMBL 
EMBL 
EMBL" 
EMBL" 
EMBL" 

eMbl" 

EMBL" 
EMBL] 
EMBL_ 
EMBL_ 
EMBL_ 
EMBL_ 
EMBL_ 
EMBL_ 
EMBL_ 
EMBL 
EMBL 



NAME 
NAME 
NAME 
_NAME 
_NAME : 
_NAME 
_NAME : 
_NAME : 
_NAME= 
_NAME : 
_NAME= 
NAME= 
NAME= 
NAME= 
NAME= 
NAME- 
NAME= 
NAME= 
NAME= 
NAME= 
NAME= 
NAME=: 



=lpll901 
=lpll897 
-lpl0044 
=lpl2933 
=lpl0171 
=lpl0167 
=lpl0166 
=lpl2836 
=lpl2732 
=lpl0150 
=lpl0132 
=lpl0802 
=lpl0805 
=lpl0806 
=lpll006 
=lpll034 
=lpll035 
lpll036 
lpll039 
lpll047 
lpll051 
lpll058 
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2192.1 

2205.1 

2209.1 

2238.1 

2244.1 

2246.1 

2247.1 

2248 .1 

2261.1 

2392.1 

2422.1 

2517.1 

2539.1 

2558.1 

2587.1 

2660.1 

2696.1 

2706.1 

2777.1 

2807.1 

2809.1 

293.1 

3151.1 

322.1 

3536.1 

3537.1 

3749.2 

3788.1 

3793.1 

3816.1 

3827.1 

3828.1 

3831.1 

3835.1 

3836.1 

3837.1 

3839.1 

3840.1 

3841.1 

3859.1 

3861.1 

3862.1 

3895.1 

3896.1 

3937.1 

3995.1 

4011.1 

4085.1 

4093.1 

4111.1 



SEQID=6834 

SEQID=6843 

SEQID=6847 

SEQID=6852 

SEQID=6855 

SEQID=6857 

SEQID=6858 

SEQID=6859 

SEQID=6863 

SEQID=6864 

SEQID=6865 

SEQID=6869 

SEQID=6883 

SEQID=6887 

SEQID=6889 

SEQID=6893 

SEQID=6896 

SEQID=6899 

SEQID=6911 

SEQED=6920 

SEQID=6922 

SEQID=6924 

SEQID=6934 

SEQID=6941 

SEQID=6946 

SEQID=6947 

SEQID=6954 

SEQID=6955 

SEQID=6956 

SEQID=6958 

SEQID=6963 

SEQID=6964 

SEQID=6965 

SEQID=6966 

SEQID=6967 

SEQID=6968 

SEQID=6969 

SEQID=6970 

SEQID=6971 

SEQID=6975 

SEQID=6976 

SEQID=6977 

SEQID=6999 

SEQID=7000 

SEQID=7001 

SEQID=7005 

SEQID=7007 

SEQID=7008 

SEQID=7009 

SEQDD=7010 



EMBL_NAME==lpl 1 075 

EMBL_NAME=lpl 1 086 

EMBL_NAME=lpl 1 090 

EMBL_NAME=lpl 1 1 10 

EMBL_NAME=lpl020 1 

EMBL_NAME=lpl0203 

EMBL_NAME=lpl0204 

EMBL_NAME=lpl0205 

EMBL_NAME=lpl02 1 3 

EMBL_NAME=lpl2580 

EMBL_NAME=lpl2558 

EMBL_NAME=lpl2496 

EMBL_NAME=lpl2478 

EMBL_NAME=lpl2465 

EMBL_NAME=lpl2443 

EMBL_NAME=lpl23 84 

EMBL_NAME=lpl2358 

EMBL_NAME=lpl23 5 1 

EMBL_NAME=lpl2308 

EMBL_NAME=lpl2287 

EMBL_NAME=lpl2285 

EMBL_NAME=lpl 1 879 

EMBL_NAME=lpl2042 

EMBL_NAME=lpl 1 857 

EMBL_NAME=lpl0286 

EMBL_NAME=lpl0287 

EMBL_NAME=lpl0569 

EMBL_NAME=lpl0593 

EMBL_NAME=lpl0596 

EMBL_NAME=plpl003 8 

EMBL_NAME=plpl0044 

EMBL_NAME=plpl0045 

EMBL_NAME=plpl0046 

EMBL_NAME=plpl0047 

EMBL_NAME=plpl0048 

EMBL_NAME=plpl0049 

EMBL_NAME=plpl0050 

EMBL_NAME=plpl005 1 

EMBL_NAME=plpl0052 

EMBL_NAME=plpl001 1 

EMBL_NAME=plplOO 1 2 

EMBL_NAME=plplOO 1 3 

EMBL_NAME=plpl0035 

EMBL_NAME=plpl0036 

EMBL_NAME=lpl 1 1 65 

EMBL_NAME=lpll 125 

EMBL_NAME=lpl 1 1 1 3 

EMBL_NAME=lpl0408 

EMBL_NAME=lpl04 1 5 

EMBL_NAME=lpl0432 
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4169.1 SEQID=7011 

4170.1 SEQID=7012 

4171.2 SEQID=7013 
4174.1 SEQID=7016 
4177.1 SEQID=7018 
4188.1 SEQID=7022 

4203.1 SEQID=7026 

4229.2 SEQID=7027 
4236.2 SEQID=7028 
4237.1 SEQID=7029 
442.1 SEQID=7036 
566.1 SEQID=7041 
690.1 SEQID=7044 
694.1 SEQID=7046 
697.1 SEQID=7047 
705.1 SEQID=7051 

707.1 SEQID=7052 

899.2 SEQID=7058 



EMBL_NAME=lplO 1 97 
EMBL_NAME=lplO 1 96 
EMBL_NAME=lpl01 95 
EMBL_NAME=lplO 1 92 
EMBL_NAME=lplO 1 90 
EMBL_NAME=lplO 1 85 
EMBL_NAME=lpl 1412 
EMBL_NAME=lpl 1 393 
EMB L_N AME=lpl 1 965 
EMBL_NAME=lpl 1 934 
EMBL_NAME=lpl 1 768 
EMBL_NAME=lpl 1 676 
EMB L_N AME=lpl 1586 
EMBL_NAME=lpl 1 584 
EMBL_NAME=lpl 1 582 
EMBL_NAME=lpl 1 578 
EMBL_NAME=lpl 1 577 
EMBL_NAME=lpl2032 
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